Composite biomarkers for immunotherapy for cancer

ABSTRACT

Methods for generating a composite biomarker that identifies a predicted level of responsiveness of a subject to a particular type of an immunotherapy treatment is provided. The method can include generating genomic metrics that represent one or more characteristics corresponding to one or more DNA sequences. The method can also include generating transcriptomic metrics represent one or more characteristics corresponding to a set of peptides that are translated from a corresponding RNA sequence of the one or more RNA sequences. The method can also include generating a composite biomarker score derived from the set of genomic metrics and the set of transcriptomic metrics. The method can also include determining, based on the composite biomarker score, a predicted level of responsiveness of the subject to a particular type of an immunotherapy treatment.

CROSS-REFERENCES TO RELATED APPLICATIONS

The present application is a continuation of International ApplicationNo. PCT/US2021/029684, filed on Apr. 28, 2021, and claims priority toU.S. Provisional Patent Application No. 63/017,542, filed on Apr. 29,2020, and U.S. Provisional Patent Application No. 63/040,943, filed onJun. 18, 2020. Each of the applications is hereby incorporated byreference herein in its entirety for all purposes.

FIELD

This disclosure generally relates to systems and methods for determiningcomposite biomarkers based on genomic and transcriptomic metrics derivedfrom a biological sample. More specifically, but not by way oflimitation, this disclosure relates to determining, based on the genomicand transcriptomic metrics, a composite biomarker score that identifiesa predicted level of responsiveness of a subject to a particular type ofan immunotherapy treatment.

BACKGROUND

Immunotherapies are used in the treatment of many cancers and autoimmuneconditions. While immune checkpoint blockade therapy is known as aneffective type of cancer treatment for a variety of malignancies,diagnostic biomarkers that consistently predict subject response tothese therapies have remained elusive. Given the highly variable andcomplex nature of immune-system resistance to immunotherapy, as well aspotential toxicities associated with treatment, it can be challenging toaccurately predict therapeutic response to certain immunotherapies.

Immunogenomics has emerged as a technique that can determine therapeuticefficacy of immunotherapies. Such technique can lead to a determinationof an effective treatment of cancers and may contribute to discovery ofseveral new therapeutics, diagnostics, and processes. For example,immunogenomics can be used to identify neoantigens, which can contributein the development of precision cancer therapeutics and diagnostics. Inaddition, genomic data such as variant calls may provide insight intocomplex immune system responses and resistance to cancerimmunotherapies. However, conventional techniques using targeteddiagnostic cancer panels provide limited amount of data, which can beunreliable for development of integrative, composite biomarkers.

BRIEF SUMMARY

In some embodiments, a method and system for determining a compositebiomarker score that identifies a predicted level of responsiveness of asubject to a particular type of an immunotherapy treatment is provided.An immunogenomics-analysis system accesses genomic data andtranscriptomic data that were generated by processing a biologicalsample of a subject. In some instances, the biological sample includesone or more cancer cells. The genomic data can identify one or more DNAsequences in the biological sample, in which whole-exome sequencing canbe performed to identify the one or more DNA sequences. Thetranscriptomic data can identify one or more RNA sequences in thebiological sample, in which transcriptome sequencing can be used toidentify the one or more RNA sequences. Additionally or alternatively,the genomic and the transcriptomic data can be generated from a samplepair that includes the biological sample and a reference biologicalsample of the subject, in which the reference biological sample does notinclude the one or more cancer cells.

The immunogenomics-analysis system processes the genomic data togenerate a set of genomic metrics. Each of the set of genomic metricscan represent one or more characteristics corresponding to acorresponding DNA sequence the one or more DNA sequences. In someinstances, the set of genomic metrics include: (i) a quantitative orcategorical metric that represents one or more characteristics for eachof one or more somatic mutations in the one or more DNA sequences; (ii)a categorical metric that indicates whether a loss of heterozygosity hasoccurred in at least one human leukocyte antigen (HLA) gene of thebiological sample; and (iii) a quantitative or categorical metric thatrepresents a predicted tumor mutational burden. With respect to the HLAloss of heterozygosity, the corresponding categorical metric can begenerated by applying the genomic data to an HLA-deletion-identificationmachine-learning model.

The immunogenomics-analysis system processes the transcriptomic data togenerate a set of transcriptomic metrics. Each of the set oftranscriptomic metrics can represent one or more characteristicscorresponding to a set of peptides that are translated from acorresponding RNA sequence of the one or more RNA sequences. In someinstances, the set of transcriptomic metrics include: (i) a quantitativeor categorical metric that represents a predicted neoantigen burden ofthe biological sample; (ii) a quantitative or categorical metric thatrepresents one or more characteristics of each of one or more candidateneoantigens detected from the biological sample; (iii) a quantitative orcategorical metric that represents one or more characteristics of eachof one or more HLA proteins for which a loss of cell-surfacepresentation is detected; (iv) a quantitative or categorical metric thatrepresents one or more characteristics corresponding to an HLA gene thatencodes the one or more HLA proteins for which the loss of cell-surfacepresentation was detected; (v) a quantitative or categorical metric thatrepresents an expression level of a sequence corresponding to an immunecell; and (vi) a quantitative or categorical metric that represents anexpression level of one or more T-cell receptors detected from thebiological sample. With respect to the HLA proteins for which a loss ofcell-surface presentation is detected, the corresponding metric can begenerated by applying the genomic and transcriptomic data to aneoantigen-presentation-prediction machine-learning model.

The immunogenomics-analysis system generates a composite biomarker scorederived from the set of genomic metrics and the set of transcriptomicmetrics and determines, based on the composite biomarker score, apredicted level of responsiveness of the subject to a particular type ofan immunotherapy treatment. In some instances, theimmunogenomics-analysis system generates the composite biomarker scoreby: (i) weighting each genomic metric of the set of genomic metrics witha weight value determined based on a corresponding transcriptomic metricof the set of transcriptomic metrics; and (ii) generating the compositebiomarker score using the weighted genomic metrics.

The immunogenomics-analysis system outputs a result that corresponds tothe predicted level of responsiveness of the subject. The result can bereport that identifies, based on the predicted level of responsivenessof the subject to the particular treatment: (i) a treatmentrecommendation of the particular treatment; (ii) a recommendation toadminister the particular treatment to the human subject; and/or (iii) arecommendation to not administer the particular treatment to the humansubject. In some embodiments, the recommended treatment is administeredto the human subject.

In some embodiments, a computer-program product is provided that istangibly embodied in a non-transitory machine-readable storage mediumand that includes instructions configured to cause one or more dataprocessors to perform part or all of one or more methods disclosedherein.

Some embodiments of the present disclosure include a system includingone or more data processors. In some embodiments, the system includes anon-transitory computer readable storage medium containing instructionswhich, when executed on the one or more data processors, cause the oneor more data processors to perform part or all of one or more methodsand/or part or all of one or more processes disclosed herein. Someembodiments of the present disclosure include a computer-program producttangibly embodied in a non-transitory machine-readable storage medium,including instructions configured to cause one or more data processorsto perform part or all of one or more methods and/or part or all of oneor more processes disclosed herein.

The terms and expressions which have been employed are used as terms ofdescription and not of limitation, and there is no intention in the useof such terms and expressions of excluding any equivalents of thefeatures shown and described or portions thereof, but it is recognizedthat various modifications are possible within the scope of theinvention claimed. Thus, it should be understood that although thepresent invention as claimed has been specifically disclosed byembodiments and optional features, modification and variation of theconcepts herein disclosed may be resorted to by those skilled in theart, and that such modifications and variations are considered to bewithin the scope of this invention as defined by the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Features, embodiments, and advantages of the present disclosure arebetter understood when the following Detailed Description is read withreference to the following figures. The patent or application filecontains at least one drawing executed in color. Copies of this patentor patent application publication with color drawing(s) will be providedby the Office upon request and payment of the necessary fee.

FIG. 1 shows an example of a schematic diagram for generating genomicdata and transcriptomic data from a biological sample, according to someembodiments.

FIGS. 2A-B show statistical data corresponding to oncogenic changes ingenomic and transcriptomic data corresponding to subjects of a clinicalcohort.

FIGS. 3A-C show statistical data corresponding to transcriptomic metricsthat identify differentially expressed genes that are associated withimmune-system response.

FIG. 4 shows statistical data corresponding to a normalized enrichmentscore for each differentially regulated immune pathway.

FIGS. 5A-C show statistical data corresponding to transcriptomic metricsthat identify expression levels of T-cell receptors.

FIG. 6 shows a set of box plots that identify a comparison of enrichmentscores between a first group of responsive subjects and a second groupof non-responsive subjects.

FIGS. 7A-B show statistical data corresponding to transcriptomic metricsthat identify neoantigen burden across various genes and disease sites.

FIGS. 8A-F show statistical data identifying neoantigen burden scoresacross various subjects, in which the neoantigen burden score can bepredictive of responsiveness of subjects treated with immunotherapies.

FIG. 9A-F show statistical data that identify one or morecharacteristics relating to mutations present in each subject sample ofthe discovery cohort.

FIG. 10 shows sets of box plots that identify tumor mutational burdenacross various driver mutations, disease sites, and subject groups.

FIGS. 11A-D show statistical data identifying composite biomarker scoresacross various subjects, in which the composite biomarker scoresindicate improved performance in predicting responsiveness of subjectstreated with immunotherapies.

FIGS. 12A-B show statistical data identifying composite biomarker scoresacross various subjects, in which the composite biomarker scoresindicate improved performance in predicting progression-free and overallsurvival rates of subjects in the cohort.

FIG. 13A-B show statistical data that identify somatic mutations to HLAgenes that may contribute to a decreased probability of neoantigenpresentation.

FIG. 14A-B shows examples of sets of panels that identify a comparisonof HLA sequences between a normal sample and a corresponding tumorsample of a particular subject.

FIG. 15 includes a flowchart illustrating an example of a method ofgenerating a composite biomarker score, according to some embodiments.

DETAILED DESCRIPTION I. Overview

As described above, efficacy of checkpoint inhibitor therapy can dependon various biological factors, including complex interactions betweenthe tumor, a corresponding tumor microenvironment, and a correspondingimmune system. Numerous biomarkers for identifying immune-systemresponses to immunotherapies have been discussed, including PD-L1expression, interferon (IFN)-γ based signatures, tumor mutationalburden, mismatch repair deficiency, genetic alterations including thosewithin the antigen presenting machinery, HLA loss of heterozygosity, andT-cell repertoire diversity.

As shown by diverse biological factors that can influence theimmune-system response to immune checkpoint blockade therapy, there hasbeen increasing effort toward an integrated biomarker that canincorporate various biological factors and accurately predictimmune-system response to immunotherapies. For example, conventionaltechniques have combined information corresponding to immunogenicity andneoantigen clonal structures of a sample to predict the immune-systemresponse to immune checkpoint blockade. The results generated by theseconventional techniques have attempted to determine a prognosis insubjects with melanoma, lung cancer, and kidney cancers. While theseconventional techniques have yielded somewhat positive results, theconventional techniques still fall short in generating data that canconsistently and accurately predict immune-system response. Thischallenge can be attributed to complex mechanisms that driving immuneresponse to tumors. Moreover, these conventional techniques requirelarge amount of samples from the subject, which can be invasive anddifficult to obtain in some circumstances (e.g., age of the subject,subject is pregnant).

To address at least the above deficiencies of conventional systems, thepresent techniques can be used to determine a composite biomarker scorethat identifies a predicted level of responsiveness of a subject to aparticular type of an immunotherapy treatment. Animmunogenomics-analysis system accesses genomic data and transcriptomicdata that were generated by processing a biological sample of a subject.In some instances, the biological sample includes one or more cancercells. The genomic data can identify one or more DNA sequences in thebiological sample, in which whole-exome sequencing can be performed toidentify the one or more DNA sequences. The transcriptomic data canidentify one or more RNA sequences in the biological sample, in whichtranscriptome sequencing can be used to identify the one or more RNAsequences. Additionally or alternatively, the genomic and thetranscriptomic data can be generated from a sample pair that includesthe biological sample and a reference biological sample of the subject,in which the reference biological sample does not include the one ormore cancer cells.

The immunogenomics-analysis system processes the genomic data togenerate a set of genomic metrics. Each of the set of genomic metricscan represent one or more characteristics corresponding to acorresponding DNA sequence the one or more DNA sequences. In someinstances, the set of genomic metrics include: (i) a quantitative orcategorical metric that represents one or more characteristics for eachof one or more somatic mutations in the one or more DNA sequences; (ii)a categorical metric that indicates whether a loss of heterozygosity hasoccurred in at least one human leukocyte antigen (HLA) gene of thebiological sample; and (iii) a quantitative or categorical metric thatrepresents a predicted tumor mutational burden. With respect to the HLAloss of heterozygosity, the corresponding categorical metric can begenerated by applying the genomic data to an HLA-deletion-identificationmachine-learning model.

The immunogenomics-analysis system processes the transcriptomic data togenerate a set of transcriptomic metrics. Each of the set oftranscriptomic metrics can represent one or more characteristicscorresponding to a set of peptides that are translated from acorresponding RNA sequence of the one or more RNA sequences. In someinstances, the set of transcriptomic metrics include: (i) a quantitativeor categorical metric that represents a predicted neoantigen burden ofthe biological sample; (ii) a quantitative or categorical metric thatrepresents one or more characteristics of each of one or more candidateneoantigens detected from the biological sample; (iii) a quantitative orcategorical metric that represents one or more characteristics of eachof one or more HLA proteins for which a loss of cell-surfacepresentation is detected; (iv) a quantitative or categorical metric thatrepresents one or more characteristics corresponding to an HLA gene thatencodes the one or more HLA proteins for which the loss of cell-surfacepresentation was detected; (v) a quantitative or categorical metric thatrepresents an expression level of a sequence corresponding to an immunecell; and (vi) a quantitative or categorical metric that represents anexpression level of one or more T-cell receptors detected from thebiological sample. With respect to the HLA proteins for which a loss ofcell-surface presentation is detected, the corresponding metric can begenerated by applying the genomic and transcriptomic data to aneoantigen-presentation-prediction machine-learning model.

The immunogenomics-analysis system generates a composite biomarker scorederived from the set of genomic metrics and the set of transcriptomicmetrics and determines, based on the composite biomarker score, apredicted level of responsiveness of the subject to a particular type ofan immunotherapy treatment. In some instances, theimmunogenomics-analysis system generates the composite biomarker scoreby: (i) weighting each genomic metric of the set of genomic metrics witha weight value determined based on a corresponding transcriptomic metricof the set of transcriptomic metrics; and (ii) generating the compositebiomarker score using the weighted genomic metrics.

The immunogenomics-analysis system outputs a result that corresponds tothe predicted level of responsiveness of the subject. The result can bereport that identifies, based on the predicted level of responsivenessof the subject to the particular treatment: (i) a treatmentrecommendation of the particular treatment; (ii) a recommendation toadminister the particular treatment to the human subject; and/or (iii) arecommendation to not administer the particular treatment to the humansubject. In some embodiments, the recommended treatment is administeredto the human subject.

Accordingly, embodiments of the present disclosure provide a technicaladvantage over conventional techniques by generating a compositebiomarker score based on validated, enhanced exome- andtranscriptome-based tumor profiling platform. In particular, thecomposite biomarker score can be determined from metrics that representcharacteristics of various tumor and immune-related molecularmechanisms, while minimizing the amount of biological sample used togenerate the metrics. Such techniques could improve the accuracy ofdiagnostic, prognostic and/or treatment recommendations for thecorresponding subject, without requiring an invasive procedure ofobtaining a large amount of biological samples. Therefore, embodimentsof the present disclosure provides a composite immunogenomics frameworkfor accurately predicting a response to immunotherapy treatments byidentifying biological mechanisms that drive the response and resistanceto such therapies.

While various embodiments of the invention(s) of the present disclosurehave been shown and described herein, it will be obvious to thoseskilled in the art that such embodiments are provided by way of exampleonly. Numerous variations, changes, and substitutions may occur to thoseskilled in the art without departing from the invention(s). It should beunderstood that various alternatives to the embodiments of theinvention(s) described herein may be employed in practicing any one ofthe inventions(s) set forth herein.

II. Definitions

While various embodiments of the invention have been shown and describedherein, it will be obvious to those skilled in the art that suchembodiments are provided by way of example only. Numerous variations,changes, and substitutions may occur to those skilled in the art withoutdeparting from the invention. It should be understood that variousalternatives to the embodiments of the invention described herein may beemployed.

As used herein the term “cancer” or “malignancy” generally refers to acollection of related diseases where the body's cells divide withoutstopping and spread into surrounding tissues. Cancer can start almostanywhere in the body and develops when the orderly process in removingand replacing old, abnormal, or damaged cells is disrupted, and thesecells survive when they should die or new cells form when they are notneeded. These cells divide without stopping and are able to spread intoand invade both nearby and distant tissues from their origin point.

As used herein, the term “neoantigen” generally refers to newly formedantigens that have not been previously recognized by the immune system.Neoantigens can arise from altered tumor proteins formed as a result oftumor mutations. Neoantigens may constitute the subset of somaticmutations that can be loaded onto MHC class I and class II molecules andpresented to T cells. These neoantigens can be seen by the immune systemas endogenous tumor-specific (non-self) targets.

As used herein, the term “tumor microenvironment” (tumormicroenvironment) refers to the environment around a tumor including thesurrounding blood vessels, immune cells, fibroblasts, signalingmolecules, and extracellular matrix. A tumor and its microenvironmentare closely related and interact constantly with dynamic reciprocity.Tumor progression is influenced by interactions of cancer cells withtheir environment and shape therapeutic responses and resistance.

As used herein, the term “biomarker” refers to a metabolite or smallmolecule derived therefrom, that is differentially present (i.e.,increased or decreased) in a biological sample from a subject or a groupof subjects having a first phenotype (e.g., having a disease) ascompared to a biological sample from a subject or group of subjectshaving a second phenotype (e.g., not having the disease). A biomarkermay be differentially present at any level, but is generally present ata level that is increased by at least 5%, by at least I 0%, by at least15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%,by at least 40%, by at least 45%, by at least 50%, by at least 55%, byat least 60%, by at least 65%, by at least 70%, by at least 75%, by atleast 80%, by at least 85%, by at least 90%, by at least 95%, by atleast 100%, by at least 110%, by at least 120%, by at least 130%, by atleast 140%, by at least 150%, or more; or is generally present at alevel that is decreased by at least 5%, by at least 10%, by at least15%, by at least 20%, by at least 25%, by at least 30%, by at least 35%,by at least 40%, by at least 45%, by at least 50%, by at least 55%, byat least 60%, by at least 65%, by at least 70%, by at least 75%, by atleast 80%, by at least 85%, by at least 90%, by at least 95%, or by 100%(i.e., absent). A biomarker is preferably differentially present at alevel that is statistically significant.

As used herein, the term “level” refers to the level of one or morebiomarkers means the absolute or relative amount or concentration of thebiomarker in the sample.

As used herein, the term “reference profile” refers to the metabolicprofile that is indicative of a healthy subject or one or more of adisease state, condition or body disorder. Within the reference profile,there will be reference levels of one or more biomarkers (metabolites orsmall molecules derived therefrom) that may be an absolute or relativeamount or concentration of the one or more biomarkers, a presence orabsence of the one or more biomarkers, a range of amount orconcentration of the one or more biomarkers, a minimum and/or maximumamount or concentration of the one or more biomarkers, a mean amount orconcentration of the one or more biomarkers, and/or a median amount orconcentration of the one or more biomarkers.

As used herein, the term “statistically significant” means at leastabout a 95% confidence level, preferably at least about a 97% confidencelevel, more preferably at least about a 98% confidence level and mostpreferably at least about a 99% confidence level, as determined usingparametric or non-parametric statistics, for example, but not limited toANOVA or Wilcoxon's ranksum Test, wherein the latter is expressed asp<0.05 for at least about a 95% confidence level. [0039] As used herein,the term “immune checkpoint blockade” generally refers to a therapywhich focuses on the termination of immune responses by inhibitingimmune suppressor molecules thus preventing the termination of immuneresponses or enabling T-lymphocyte that become exhausted during animmune response.

Whenever the term “at least,” “greater than,” or “greater than or equalto” precedes the first numerical value in a series of two or morenumerical values, the term “at least,” “greater than” or “greater thanor equal to” applies to each of the numerical values in that series ofnumerical values. For example, greater than or equal to 1, 2, or 3 isequivalent to greater than or equal to 1, greater than or equal to 2, orgreater than or equal to 3.

Whenever the term “no more than,” “less than,” or “less than or equalto” precedes the first numerical value in a series of two or morenumerical values, the term “no more than,” “less than,” or “less than orequal to” applies to each of the numerical values in that series ofnumerical values. For example, less than or equal to 3, 2, or I isequivalent to less than or equal to 3, less than or equal to 2, or lessthan or equal to 1.

The use of the word “a” or “an,” when used in conjunction with the term“comprising” in the claims and/or the specification may mean “one,” butit is also consistent with the meaning of “one or more,” “at least one,”and “one or more than one.”

The use of the term “or” in the claims is used to mean “and/or” unlessexplicitly indicated to refer to alternatives only or the alternativesare mutually exclusive, although the disclosure supports a definitionthat refers to only alternatives and “and/or.” As used herein “another”may mean at least a second or more.

The terms “comprise,” “have,” and “include” are open-ended linkingverbs. Any forms or tenses of one or more of these verbs, such as“comprises,” “comprising,” “has,” “having,” “includes,” and “including,”are also open-ended. For example, any method that “comprises,” “has,” or“includes” one or more steps is not limited to possessing only those oneor more steps and also covers other unlisted steps.

III. Immunotherapy Treatments and Immune System Response Mechanisms

A. Tumor Microenvironment

An immune system can detect a wide variety of antigens, such asvirus(es), parasitic worm(s), or allergen(s), cancer(s) and initiate aresponse in the body against foreign substances, abnormal cells and/ortissues. Cancerous growths, including malignant cancerous growths, canalso be recognized by the immune cells of a subject and trigger animmune response. The activation of immune cells can trigger numerousintracellular signaling pathways, which require tight control in orderto mount an adequate immune response. Cancerous growths can interactintimately with their microenvironment. A tumor may consist not only ofa heterogeneous population of cancer cells but also a variety ofresident and infiltrating host cells, secreted factors, andextracellular matrix proteins. Cancer and tumor progression may beprofoundly influenced by interactions of cancer cells with this tumormicroenvironment, which may ultimately determine tumor eradication,metastasis, therapeutic response, or resistance. The mechanisms of thetumor microenvironment on cancer progression may provide a therapeuticavenue in targeting components of the tumor microenvironment, such as inimmune checkpoint inhibitor therapies.

The tumor microenvironment, particularly in solid tumors, may remainhostile to immune cells, such as effector T-cells. Barrages ofimmunosuppressive signals and shortage of essential nutrients within thetumor microenvironment may result in T-cell exhaustion. Overcoming thetumor microenvironment and determining early predictive responses totreatments may an important factor in promoting the efficiency ofimmunotherapies in eradicating cancer cells in tumors. Metabolicreprogramming and plasticity of cancer cells to adapt to their rapidproliferation may be an important mechanism of treatment resistance inmalignant cancers. Several immune cell types are present in the tumormicroenvironment and may have an active role in cancer progression,including but not limited to macrophages, B-cells, T-cells, neutrophils,and dendridic cells.

B. Tumor Escape Mechanisms

The progression from neoplastic initiation to malignancy may happen inpart because of the failure of immune surveillance. Cancer cells mayescape immune recognition and elimination and create animmune-suppressive microenvironment. Due to the high consumption bycancer cells, native immune cells in the region may face a nutrientdeprived environment. Multiple metabolic byproducts of cancer cellmetabolism such as lactate and the end product of glycolysis may beharmful to the native immune cells, impairing their differentiation,activation, fitness, anti-tumor function, and rendering them broadlyunable to compete with the cancer cells.

Metabolic changes in the tumor microenvironment such as hypoxia may alsoaffect the differentiation program of myeloid cells altering theirantigen presenting properties. Hypoxia-mediated expression canselectively upregulate the expression of inhibitory ligands promotingT-cell immunosuppression. As cancer-mediated metabolic changes in thetumor microenvironment impact the cellular composition and function ofthe immune microenvironment, targeting metabolic changes of cancer cellsmay impact cancer cell growth and progression as well as providetherapeutic targets for improvement of anti-tumor immunity by alteringthe metabolic program of immune cells and their anti-tumor functions.

C. Immunotherapies

Metabolic processes may regulate immune cell response in quiescentconditions as well as during pathogenic processes such as infection,inflammation, cancer, and autoimmunity. In these complicated conditions,immunotherapies may provide a novel therapeutic avenue. Macrophages aswell as other immune cells display metabolic plasticity dependent ondisease pathology. Tumor infiltrating lymphocytes may be a notable partof the tumor microenvironment, and correlate with improved prognosis andresponse to therapy (Cogdill, Andrews, and Wargo 2017 Tomioka et al.2018).

Immunotherapies may activate the subject's immune system to fightcancer. For effective eradication of cancer cells with immunotherapy,T-cells or other immune cells may recognize tumor peptides presented byhuman leukocyte antigens (HLAs). The HLA, or major histocompatibilitycomplex may be proteins involved in antigen presentation and can beencoded by HLA genes. Checkpoint inhibitor therapy has demonstratedmeaningful antitumor activity, with subject response influenced by avariety of biological factors, including complex interactions betweenthe tumor, tumor microenvironment, and immune system (Hodi et al. 2010;Larkin, Ho and Wolchok 2015 Hugo et al. 2016; Ribas et al. 2016; Wolchoket al. 2017).

Immune checkpoint blockade therapy may be utilized to promote or inhibitT-cell activation. Immune responses may comprise an initiation phase andan activation phase where the immune system recognizes a danger signaland becomes activated by innate signals to fight the danger. Thisreaction may be one of the first steps for resisting infections andcancer but needs to be turned off once the danger is controlled aspersistence of this activation may cause tissue damage. After activationof the immune system a termination phase follows, where endogenousimmune suppressor molecules m ay arrest immune responses to preventdamage. In cancer immune therapies, therapeutic approaches classicallyenhanced the initiation and activation of immune responses to increasethe emergence and the efficacy of T-lymphocytes against cancers. Immunecheckpoint blockade therapies may focus on the termination of immuneresponses by inhibiting immune suppressor molecules thus preventing thetermination of immune responses or awakening T-lymphocytes that becameexhausted during an immune response. Blocking negatively regulatingimmune checkpoints may restore the capacity of exhausted immune cell sto kill the cancer they infiltrate and drive surviving cancer cells intoa state of dormancy.

Immune checkpoints may be co-stimulatory and inhibitory elementsintrinsic to the immune system. Immune checkpoints may aid inmaintaining self-tolerance and modulating the duration and amplitude ofphysiological immune responses to prevent injury to tissues when theimmune system responds to pathogenic infection. An immune response canalso be initiated when a T-cell recognizes antigens that arecharacteristic of a tumor cell. The equilibrium between theco-stimulatory and inhibitory signals may be used to control the immuneresponse from T-cells can be modulated by immune checkpoint proteins.After T-cells mature and activate in the thymus, T-cells can travel tosites of inflammation and injury to perform repair functions. T-cellfunction can occur either via direct action or through the recruitmentof cytokines and membrane ligands involved in the immune system. Thesteps involved in T-cell maturation, activation, proliferation, andfunction can be regulated through co-stimulatory and inhibitory signals,namely through immune checkpoint proteins. Tumors can dysregulatecheckpoint protein function as an immune-resistance mechanism. Thus, thedevelopment of modulators of checkpoint proteins can have therapeuticvalue. Non-limiting examples of immune checkpoint molecules includeCTLA4 and PD-I. These checkpoint molecules can operate upstream of IL-2in a pathway.

IV. Examples of Biomarkers Used for Predicting Immune System Response toImmunotherapies

Immunological checkpoint molecules may be members of the immunoglobulinsuperfamily and may be inhibitory receptors that prevent uncontrolledimmune reactions. The adaptive immune response may be controlled by suchcheckpoint molecules, which can be used for maintaining self-toleranceand minimizing collateral tissue damage that can occur during an immuneresponse. Numerous biomarkers of response to immune checkpoint blockadehave been proposed, including PD-L I expression, interferon (IFN γ basedsignatures, tumor mutational burden, microsatellite instability (MSI)and mismatch repair deficiency, genetic alterations including thosewithin the antigen presenting machinery (antigen presenting machinery),HLA loss of heterozygosity (HLA loss of heterozygosity), and T cellrepertoire diversity (Herbst et al. 2014; Gao et al. 2016; Zaretsky etal. 2016; Roh et al. 2017 Sade-Feldman et al. 2017; Mariathasan et al.2018; Chowell et al. 2019).

Owing to the diversity of biological features that can influenceresponse to immune checkpoint blockade therapy, there has beenincreasing effort toward identifying biomarkers that integrate multiplebiological features to better predict response to immunotherapy(Charoentong et al. 2017). (Charoentong et al. 2017). In one sucheffort, a signature combining purity-corrected tumor mutational burdenalong with receptor tyrosine kinase (RTK) mutations, HLA mutations, andsmoking signatures was used to predict immune checkpoint blockaderesponse in non-small-cell lung carcinoma (NSCLC) (Anagnostou et al.2020), while a melanoma study combined genomic, transcriptomic, andclinical data to predict response to immune checkpoint blockade (Liu etal. 2019).

Neoantigens can constitute the subset of somatic mutations that can beloaded onto MHC class I and class II molecules and presented to T cells.These neoantigens can be seen by the immune system as endogenoustumor-specific (non-self) targets. Immune checkpoint blockade isconsidered to exploit the ability of cytotoxic (CD8+) T cells to detectand destroy cancer cells displaying neoantigens on their h-IC class Imolecules (Schumacher and Schreiber 2015). Work integratingimmunogenicity and neoantigen clonal structures predicted response toimmune checkpoint blockade and prognosis in subjects with melanoma, lungcancer, and kidney cancers, suggesting broad applicability of thebiomarker (Lu et al. 2020).

Recently, an increased effort in identifying surrogate biomarkers forcancer diagnostics and progression using gene expression analyses,metabolomics, and proteomics methods. Gene expression analysis mayprovide insight on loss of heterozygosity (loss of heterozygosity), across-chromosomal event that may result in loss of the entire gene andsurrounding chromosomal region, loss of heterozygosity may indicate theabsence of a functional tumor suppressor gene in the lost region incancers. A tumor suppressor gene may be inactivated through either thisloss of through a point mutation leaving no tumor suppressor gene toprotect the body from cancerous growth. HLA loss of heterozygositydetection may be a pan-cancer biomarker.

V. Techniques for Generating a Composite Biomarker Score

As described herein, a composite biomarker score generated by animmunogenomics-analysis system can incorporate information pertaining todamaging events in the antigen presentation machinery (e.g., HLA loss ofheterozygosity) with predicted neoantigens to stratify subject responseto immunotherapy. The composite biomarker score outperforms conventionalsingle-analyte biomarkers, suggesting that complex models capturingmultiple aspects of tumor escape can provide more robust stratificationof subject response. In addition, such data-intensive biomarkers areclinically practical, with comprehensive tumor profiling in variousclinical cohorts achieved using limited tumor tissue. These findingsprovide an accurate composite biomarker of response in late-stage cancersubjects, as well as evidence supporting the use of whole exome andtranscriptome data in a clinical setting.

A. Generating Genomic and Transcriptomic Data

1. Biological Sample

FIG. 1 shows an example of a schematic diagram 100 for generatinggenomic data and transcriptomic data from a biological sample, accordingto some embodiments. For example, the schematic diagram 100 includesselecting a biological sample from a subject, in which the biologicalsample includes cancer cells. In some instances, pre-treatment bloodnormal and tumor samples are collected from the subject. For example,the pre-treatment blood normal and tumor samples can be collected from asubject with unresectable, stage III/IV melanoma who underwent anti-PD-1therapy.

The biological sample can be processed to generate an immunogenomicsprofile of the subject, in which the profile can include comprehensivetumor mutation information, gene expression quantification, neoantigencharacterization, HLA (typing, mutation, and loss of heterozygosity),T-cell receptor repertoire profiling, microsatellite instabilitydetection, oncovirus identification, and tumor microenvironmentprofiling. The profile data can then be analyzed together with clinicaloutcome, and a composite biomarker score computed for the subject so asto identify the predicted level of responsiveness to a particularimmunotherapy treatment.

A sample may be taken from a subject. A sample may be obtained (e.g.,extracted or isolated) from or include blood (e.g., whole blood),plasma, serum, umbilical cord blood, chorionic villi, amniotic fluid,lavage fluid (e.g., bronchoalveolar, gastric, peritoneal, ductal, ear,arthroscopic), biopsy sample (e.g., from pre-implantation embryo),celocentesis sample, fetal nucleated cells or fetal cellular remnants,bile, breast milk, urine, saliva, mucosal excretions, sputum, stool,sweat, vaginal fluid, fluid from a hydrocele (e.g., of the testis),vaginal flushing fluids, pleural fluid, ascitic fluid, cerebrospinalfluid, bronchoalveolar lavage fluid, discharge fluid from the nipple,aspiration fluid from different parts of the body (e.g., thyroid,breast), tears, embryonic cells, or fetal cells (e.g., placental cells).In some embodiments, a blood sample is obtained by a heel or fingerprick, from scalp veins, or by ear lobe puncture. The biological samplecan be a fluid or tissue sample (e.g., skin sample). The biologicalsample can include any tissue or material derived from a living or deadsubject. A biological sample can be a cell-free sample. A biologicalsample can comprise a protein or nucleic acid (e.g., DNA or RNA r afragment thereof. A sample may be fixed or may not be fixed. A samplemay be embedded or may be free. A sample may be a formalin-fixedparaffin-embedded sample.

The biological sample(s) may include one or more nucleic acid molecules.The nucleic acid molecule may be a DNA molecule, RNA molecule (e.g.mRNA, cRNA or miRNA), and DNA/RNA hybrids. Examples of DNA moleculesinclude, but are not limited to, double-stranded DNA, single-strandedDNA, single-stranded DNA hairpins, cDNA, genomic DNA. The nucleic acidmay be an RNA molecule, such as a double-stranded RNA, single-strandedRNA, ncRNA, RNA hairpin, and mRNA. Examples of ncRNA include, but arenot limited to, siRNA, miRNA, snoRNA, piRNA, tiRNA, PASR, TASR, aTASR,TSSa-RNA, snRNA, RE-RNA, uaRNA, x-ncRNA, hY RNA, usRNA, snaR, and vtRNA.

2. Sequencing

To generate DNA sequences corresponding to the genomic data from thebiological sample, whole exome library preparation and sequencing can beperformed. DNA is extracted from the biological sample, processed, andsubjected to whole exome sequencing. Whole-exome capture libraries canbe constructed using DNA from the tumor and normal blood samples. Insome instances, target probes are used to enhance coverage ofbiomedically and clinically relevant genes. Protocols can be modified toyield an average library insert length of approximately 250 bp.Sequencing reads are subjected to quality control processing (e.g., viaFastQC) to provide FASTQ files. FASTQ files are aligned to a referencegenome to generate BAM files.

To generate RNA sequences corresponding to the transcriptomic data fromthe biological sample, transcriptome sequencing can be performed. Insome instances, the transcriptome sequencing includes microarrays andRNA-Seq. Microarrays can be configured to measure the abundances of adefined set of transcripts via their hybridization to an array ofcomplementary probes. RNA-Seq can refer to sequencing complementary DNAsof transcripts in the biological samples, in which abundance of thecomplementary DNAs is derived from the number of counts from eachtranscript.

In some cases, sample processing includes nucleic acid sample processingand subsequent nucleic acid sample sequencing. Some or all of a nucleicacid sample may be sequenced to provide sequence information, which maybe stored or otherwise maintained in an electronic, magnetic or opticalstorage location. The sequence information may be analyzed with the aidof a computer processor, and the analyzed sequence information may bestored in an electronic storage location. The electronic storagelocation may include a pool or collection of sequence information andanalyzed sequence information generated from the nucleic acid sample.

Some embodiments may include using whole genome sequencing. In somecases, the whole genome sequencing is used to identify variants in aperson. In some cases, sequencing can include deep sequencing over afraction of the genome. For example, the fraction of the genome may beat least about 50; 75; 100; 125; 150; 175; 200; 225; 250; 275; 300; 350;400; 450; 500; 550; 600; 650; 700; 750; 800; 850; 900; 950; 1,000; 1100;1200; 1300; 1400; 1500; 1600; 1700; 1800; 1900; 2,000; 3,000; 4,000;5,000; 6,000; 7,000; 8,000; 9,000; 10,000; 15,000; 20,000; 30,000;40,000; 50,000; 60,000; 70,000; 80,000; 90,000; 100,000 or more bases orbase pairs. In some cases, the genome may be sequenced over 1 million, 2million, 3 million, 4 million, 5 million, 6 million, 7 million, 8million, 9 million, 10 million or more than 10 million bases or basepairs. In some cases, the genome may be sequenced over an entire exome(e.g., whole exome sequencing). In some cases, the deep sequencing mayinclude acquiring multiple reads over the fraction of the genome. Forexample, acquiring multiple reads may include at least 2, 3, 4, 5, 6, 7,8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600,700, 800, 900, 1000, 10,000 reads or more than 10,000 reads over thefraction of the genome.

Some embodiments may include detecting low allelic fractions by deepsequencing. In some cases, the deep sequencing is done by nextgeneration sequencing. In some cases, the deep sequencing is done byavoiding error-prone regions. In some cases, the error-prone regions mayinclude regions of near sequence duplication, regions of unusually highor low % GC, regions of near homopolymers, di- and tri-nucleotide, andregions of near other short repeats. In some cases, the error-proneregions may include regions that lead to DNA sequencing errors (e.g.,polymerase slippage in homopolymer sequences).

Some embodiments may include conducting one or more sequencing reactionson one or more nucleic acid molecules in a sample. Some embodiments mayinclude conducting 1 or more, 2 or more, 3 or more, 4 or more, 5 ormore, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 15 ormore, 20 or more, 30 or more, 40 or more, 50 or more, 60 or more, 70 ormore, 80 or more, 90 or more, 100 or more, 200 or more, 300 or more, 400or more, 500 or more, 600 or more, 700 or more, 800 or more, 900 ormore, or 1000 or more sequencing reactions on one or more nucleic acidmolecules in a sample. The sequencing reactions may be runsimultaneously, sequentially, or a combination thereof. The sequencingreactions may include whole genome sequencing or exome sequencing. Thesequencing reactions may include Maxim-Gilbert, chain-termination orhigh-throughput systems. Alternatively, or additionally, the sequencingreactions may include Helioscope™ single molecule sequencing, NanoporeDNA sequencing, Lynx Therapeutics' Massively Parallel SignatureSequencing (MPSS), 454 pyrosequencing, Single Molecule real time (RNAP)sequencing, Illumina (Solexa) sequencing, SOLiD sequencing, IonTorrent™, Ion semiconductor sequencing, Single Molecule SMRT™sequencing, Polony sequencing, DNA nanoball sequencing, VisiGenBiotechnologies approach, or a combination thereof. Alternatively, oradditionally, the sequencing reactions can include one or moresequencing platforms, including, but not limited to, Genome AnalyzerIIx, HiSeq, and MiSeq offered by Illumina, Single Molecule Real Time(SMRT™) technology, such as the PacBio RS system offered by PacificBiosciences (California) and the Solexa Sequencer, True Single MoleculeSequencing (tSMS™) technology such as the HeliScope™ Sequencer offeredby Helicos Inc. (Cambridge, Mass.). Sequencing reactions may alsoinclude electron microscopy or a chemical-sensitive field effecttransistor (chemFET) array. In some aspects, sequencing reactionsinclude capillary sequencing, next generation sequencing, Sangersequencing, sequencing by synthesis, sequencing by ligation, sequencingby hybridization, single molecule sequencing, or a combination thereof.Sequencing by synthesis may include reversible terminator sequencing,processive single molecule sequencing, sequential flow sequencing, or acombination thereof. Sequential flow sequencing may includepyrosequencing, pH-mediated sequencing, semiconductor sequencing, or acombination thereof.

Some embodiments may include conducting at least one long readsequencing reaction and at least one short read sequencing reaction. Thelong read sequencing reaction and/or short read sequencing reaction maybe conducted on at least a portion of a subset of nucleic acidmolecules. The long read sequencing reaction and/or short readsequencing reaction may be conducted on at least a portion of two ormore subsets of nucleic acid molecules. Both a long read sequencingreaction and a short read sequencing reaction may be conducted on atleast a portion of one or more subsets of nucleic acid molecules.

Sequencing of the one or more nucleic acid molecules or subsets thereofmay include at least about 5; 10; 15; 20; 25; 30; 35; 40; 45; 50; 60;70; 80; 90; 100; 200; 300; 400; 500; 600; 700; 800; 900; 1,000; 1500;2,000; 2500; 3,000; 3500; 4,000; 4500; 5,000; 5500; 6,000; 6500; 7,000;7500; 8,000; 8500; 9,000; 10,000; 25,000; 50,000; 75,000; 100,000;250,000; 500,000; 750,000; 10,000,000; 25,000,000; 50,000,000;100,000,000; 250,000,000; 500,000,000; 750,000,000; 1,000,000,000 ormore sequencing reads.

Sequencing reactions may include sequencing at least about 50; 60; 70;80; 90; 100; 110; 120; 130; 140; 150; 160; 170; 180; 190; 200; 210; 220;230; 240; 250; 260; 270; 280; 290; 300; 325; 350; 375; 400; 425; 450;475; 500; 600; 700; 800; 900; 1,000; 1500; 2,000; 2500; 3,000; 3500;4,000; 4500; 5,000; 5500; 6,000; 6500; 7,000; 7500; 8,000; 8500; 9,000;10,000; 20,000; 30,000; 40,000; 50,000; 60,000; 70,000; 80,000; 90,000;100,000 or more bases or base pairs of one or more nucleic acidmolecules. Sequencing reactions may include sequencing at least about50; 60; 70; 80; 90; 100; 110; 120; 130; 140; 150; 160; 170; 180; 190;200; 210; 220; 230; 240; 250; 260; 270; 280; 290; 300; 325; 350; 375;400; 425; 450; 475; 500; 600; 700; 800; 900; 1,000; 1500; 2,000; 2500;3,000; 3500; 4,000; 4500; 5,000; 5500; 6,000; 6500; 7,000; 7500; 8,000;8500; 9,000; 10,000; 20,000; 30,000; 40,000; 50,000; 60,000; 70,000;80,000; 90,000; 100,000 or more consecutive bases or base pairs of oneor more nucleic acid molecules.

Preferably, the sequencing techniques used in the methods of theinvention generates at least 100 reads per run, at least 200 reads perrun, at least 300 reads per run, at least 400 reads per run, at least500 reads per run, at least 600 reads per run, at least 700 reads perrun, at least 800 reads per run, at least 900 reads per run, at least1000 reads per run, at least 5,000 reads per run, at least 10,000 readsper run, at least 50,000 reads per run, at least 100,000 reads per run,at least 500,000 reads per run, or at least 1,000,000 reads per run.Alternatively, the sequencing technique used in the methods of theinvention generates at least 1,500,000 reads per run, at least 2,000,000reads per run, at least 2,500,000 reads per run, at least 3,000,000reads per run, at least 3,500,000 reads per run, at least 4,000,000reads per run, at least 4,500,000 reads per run, or at least 5,000,000reads per run.

Preferably, the sequencing techniques used in the methods of theinvention can generate at least about 30 base pairs, at least about 40base pairs, at least about 50 base pairs, at least about 60 base pairs,at least about 70 base pairs, at least about 80 base pairs, at leastabout 90 base pairs, at least about 100 base pairs, at least about 110,at least about 120 base pairs per read, at least about 150 base pairs,at least about 200 base pairs, at least about 250 base pairs, at leastabout 300 base pairs, at least about 350 base pairs, at least about 400base pairs, at least about 450 base pairs, at least about 500 basepairs, at least about 550 base pairs, at least about 600 base pairs, atleast about 700 base pairs, at least about 800 base pairs, at leastabout 900 base pairs, or at least about 1,000 base pairs per read.Alternatively, the sequencing technique used in the methods of theinvention can generate long sequencing reads. In some instances, thesequencing technique used in the methods of the invention can generateat least about 1,200 base pairs per read, at least about 1,500 basepairs per read, at least about 1,800 base pairs per read, at least about2,000 base pairs per read, at least about 2,500 base pairs per read, atleast about 3,000 base pairs per read, at least about 3,500 base pairsper read, at least about 4,000 base pairs per read, at least about 4,500base pairs per read, at least about 5,000 base pairs per read, at leastabout 6,000 base pairs per read, at least about 7,000 base pairs perread, at least about 8,000 base pairs per read, at least about 9,000base pairs per read, at least about 10,000 base pairs per read, 20,000base pairs per read, 30,000 base pairs per read, 40,000 base pairs perread, 50,000 base pairs per read, 60,000 base pairs per read, 70,000base pairs per read, 80,000 base pairs per read, 90,000 base pairs perread, or 100,000 base pairs per read.

High-throughput sequencing systems may allow detection of a sequencednucleotide immediately after or upon its incorporation into a growingstrand, i.e., detection of sequence in real time or substantially realtime. In some cases, high throughput sequencing generates at least1,000, at least 5,000, at least 10,000, at least 20,000, at least30,000, at least 40,000, at least 50,000, at least 100,000 or at least500,000 sequence reads per hour; with each read being at least 50, atleast 60, at least 70, at least 80, at least 90, at least 100, at least120, at least 150, at least 200, at least 250, at least 300, at least350, at least 400, at least 450, or at least 500 bases per read.Sequencing can be performed using nucleic acids described herein such asgenomic DNA, cDNA derived from RNA transcripts or RNA as a template.

3. Alignment

Sequence reads (e.g., the DNA sequences, the RNA sequences) generated bythe above sequencing techniques can be mapped to a correspondingreference genome (e.g., hs37d5 reference genome build). In someinstances, an alignment pipeline performs alignment, duplicate removal,and base quality score recalibration to generating the genomic andtranscriptomic data. The pipeline uses the Picard toolkit (RRID:SCR_006525) for duplicate removal and Genome Analysis Toolkit (GATK,RRID:SCR_001876) to improve sequence alignment and to correct basequality scores (BQSR). Aligned sequence data is then returned in BAMformat according to the SAM (RRID:SCR_01095) specification. In someinstances, the somatic variants are identified based on the alignment ofthe sequence reads to the reference genome.

In some instances, whole-transcriptome sequencing was aligned using STAR(RRID:SCR_015899) and normalized expression values in transcripts permillion (TPM) was calculated. For RNA sequencing and alignment qualitycontrol, the following metrics can be identified: average read length,percentage of uniquely mapped reads, average mapped read pair length,number of splice sites, mismatch rate per base, deletion/insertion rateper base, mean deletion/insertion length, and anomalous read pairalignments including inter-chromosomal and orphaned reads.

B. Transcriptomic Metrics Derived from Transcriptomic Data

The immunogenomics-analysis system processes the transcriptomic datacorresponding to the biological sample to generate a set oftranscriptomic metrics. Each of the set of transcriptomic metrics canrepresent one or more characteristics corresponding to a set of peptidesthat are translated from a corresponding RNA sequence of the one or moreRNA sequences. In some instances, the set of transcriptomic metricsinclude: (i) a quantitative or categorical metric that represents apredicted neoantigen burden of the biological sample; (ii) aquantitative or categorical metric that represents one or morecharacteristics of each of one or more candidate neoantigens detectedfrom the biological sample; (iii) a quantitative or categorical metricthat represents one or more characteristics of each of one or more HLAproteins for which a loss of cell-surface presentation is detected; (iv)a quantitative or categorical metric that represents one or morecharacteristics corresponding to an HLA gene that encodes the one ormore HLA proteins for which the loss of cell-surface presentation wasdetected; and (v) a quantitative or categorical metric that representsan expression level of one or more T-cell receptors detected from thebiological sample. With respect to the HLA proteins for which a loss ofcell-surface presentation is detected, the corresponding metric can begenerated by applying the genomic and transcriptomic data to aneoantigen-presentation-prediction machine-learning model.

1. Immune Infiltrate Signatures

The set of transcriptomic metrics can include a quantitative orcategorical metric that represents an expression level of a sequencecorresponding to an immune cell. In some instances, the quantitative orcategorical metric is an immune infiltration score, which is derivedbased on quantities of different types of tumor-infiltrating immunecells. The immune infiltration scores can be calculated usingtranscriptomic data. For example, semi-quantitative scores representingthe enrichment of gene sets can be calculated in single samples. In someinstances, a set of reference gene expression signatures representing 17cell types are used to generate the immune infiltration scores, in whichthe cell types may include malignant cells, CAFs, endothelial cells, NKcells, B cells, macrophages, and CD8⁺ and CD4⁺ T cells.

To generate the immune infiltration scores, gene set enrichment analysiscan be used to compute an enrichment score that is high when the genesspecific for a certain cell type are amongst the top highly expressed inthe sample of interest (i.e., the cell type is enriched in the sample)and low otherwise. Enrichment scores for the same cell type (gene set)can be compared across samples, profiling immune infiltration for thesubject. Additionally or alternatively, the immune infiltration score isgenerated using deconvolution techniques that can quantitativelyestimate the relative fractions of the cell types of interest (e.g.,cancer cells). Deconvolution algorithms consider gene expressionprofiles of a heterogeneous sample as the convolution of the geneexpression levels of the different cells, and estimate the unknown cellfractions leveraging on a signature matrix describing thecell-type-specific expression profiles.

2. Expression Levels of T-Cell Receptors

The set of transcriptomic metrics can include a quantitative orcategorical metric that represents an expression level of one or moreT-cell receptors detected from the biological sample. The expressionlevel of the one or more T-cell receptors can identify a level anddistribution of clonal lymphocytes detected in the biological sample.Quality and quantity of lymphocytes from the biological sample can beused to identifying various factors affecting the subject's health anddisease. The expression level of the one or more T-cell receptors can beinterpreted as having normal immune diversity, development, orreconstitution, or can be otherwise interpreted as having inflammation,infection, vaccination, autoimmunity, or cancer. In some instances, anumber of analytic parameters that are used to assess the quality andquantity of a lymphoid infiltrate of the biological sample. The analyticparameters may include diversity, richness, evenness, clonality, andentropy metrics.

In some instances, the expression level of the one or more T-cellreceptors corresponds to clonality of T-cell receptor β (TCR-β)sequences detected in the biological sample. The immunogenomics-analysissystem processes the transcriptomic data to profile TCR-β clones, whichprovides augmented (approximately a 100× increase over a standardtranscriptome) coverage of TCR-β. Nonproductive clones which have aframe-shift or premature stop codon in the CDR3 sequence can be filteredout, as well as low-confidence clones which have an alignment scorebelow threshold for the V or J hit. Clonality can then calculated as1-Pielou's evenness.

3. Differential Gene Expressions

The set of transcriptomic metrics can include a quantitative metric thatrepresents read counts per gene identified in the transcriptomic data.For example, counts per million of sequence reads can be calculated bynormalizing read counts per gene by the total number of reads identifiedin the biological sample. In some instances, a threshold is selected asto whether a particular gene should be part of the quantitative metric.For example, only genes with read counts per million >0 in 25% or moreof the samples of a cohort can be included for analysis. In someinstances, remaining data are processed using rlog transformation anddifferential gene expression are analyzed. Genes with an adjusted pvalue <0.05, and a minimum log 2 fold change of <−0.5 or >1 wereconsidered differentially expressed. Biological significance ofdifferentially expressed genes can be identified at the pathway levelusing various gene sets, including but not limited to MSigDB (MolecularSignatures Database, RRID:SCR_016863) hallmark gene sets and KEGG(RRID:SCR_012773) gene sets.

4. Neoantigen-Presentation Prediction

The set of transcriptomic metrics can include a quantitative orcategorical metric that represents one or more characteristics of eachof one or more HLA proteins for which a loss of cell-surfacepresentation is detected. In particular, the transcriptomic metric cancorrespond to patient specific tumor alterations that could interferewith neoantigen presentation, including HLA mutations, HLA loss ofheterozygosity, and beta-2-microglobulin mutations.

The neoantigen-presentation prediction metric can be generated byidentifying candidate neoantigens generated using tumor-specific genomicevents (single-nucleotide variants, indels, and fusions) that wereverified using the transcriptomic data. All candidate peptides can bescored using a neoantigen-presentation-prediction machine-learning modelfor predicting MHC class I presentation, which can be trained usinglarge scale immunopeptidome datasets. The trainedneoantigen-presentation-prediction machine-learning model can use datacorresponding to each of the candidate peptides to generate an outputthat predicts whether the candidate peptide will be presented andexpressed on the cell surface. Based on the output of themachine-learning model, a neoantigen burden score can be calculatedusing a subset of candidate peptides that pass a confidence threshold.To calculate the composite biomarker score, the neoantigen burden scorecan be adjusted to account for subject-specific tumor alterations whichmay impair neoantigen presentation, including alterations to the MHCcomplex and antigen presentation machine and HLA loss of heterozygosity.

C. Genomic Metrics Derived from Genomic Data

The immunogenomics-analysis system can process the genomic data togenerate a set of genomic metrics. Each of the set of genomic metricscan represent one or more characteristics corresponding to acorresponding DNA sequence the one or more DNA sequences. In someinstances, the set of genomic metrics include: (i) a quantitative orcategorical metric that represents one or more characteristics for eachof one or more somatic mutations in the one or more DNA sequences; (ii)a categorical metric that indicates whether a loss of heterozygosity hasoccurred in at least one human leukocyte antigen (HLA) gene of thebiological sample; and (iii) a quantitative or categorical metric thatrepresents a predicted tumor mutational burden. With respect to the HLAloss of heterozygosity, the corresponding categorical metric can begenerated by applying the genomic data to an HLA-deletion-identificationmachine-learning model.

1. Single-Nucleotide Variants and Indels

The set of genomic metrics can include a quantitative or categoricalmetric that represents one or more characteristics for each of one ormore somatic mutations in the one or more DNA sequences. The one or moresomatic mutations can include single-nucleotide variants,insertion/deletion polymorphisms, copy number alterations, and fusionsin one or more nucleic acid molecules of the DNA sequences. In someinstances, quality metrics can be generated for each identified mutationin the DNA sequences, including number of mutations, a ratio oftransition to transversion, variant-level concordance, etc. For example,the genomic data can be processed using a quality score recalibrationmodule, which can stratify single-nucleotide variants by theirlikelihood of representing false positive calls. In some instances,sequence alignment information of the genomic data can be processed suchthat miscalled variants can be corrected. Additionally or alternatively,somatic single-nucleotide variants and indel calls can be combined andanalyzed through a tested set of filters based on 1) alignment metrics,such as sequence coverage and read quality, 2) positional features, suchas proximity to a gap region, and 3) likelihood of presence in normaltissue.

2. Allele-Specific HLA Loss of Heterozygosity

The set of genomic metrics can also include a categorical metric thatindicates whether a loss of heterozygosity has occurred in at least oneHLA gene of the biological sample. HLA loss of heterozygosity can bedetected using a HLA-deletion-identification machine-learning model, asHLA loss of heterozygosity can impact neoantigen presentation. HLA lossof heterozygosity can be considered as an acquired resistance mechanismthat facilitates immune escape by reducing capacity for presentation oftumor neoantigens to the immune system. As the process of HLA loss isgoverned by selective pressures within the tumor microenvironment,particularly at later stages of tumor evolution, it was hypothesizedthat within the cohort of late-stage melanoma subjects allele-specificHLA loss of heterozygosity could contribute to reduced therapeuticresponse despite apparent elevated neoantigen burden.

To generate the above genomic metric, the biological sample canprocessed using the following steps: 1) all tumor and normal reads weremapped to the subject's allele-specific HLA; 2) homologous alleles werealigned to find all patient-specific mismatch positions; and 3)normalized b-allele frequencies and allele-specific coverage ratios werecalculated at each mismatch position. For each gene, allele-specificfeatures were input into the HLA-deletion-identificationmachine-learning model to predict loss of heterozygosity, includingnormalized b-allele frequencies and allele-specific mismatchedpositions, tumor purity, and tumor ploidy.

3. Mutational Burden

The set of genomic metrics can include a quantitative or categoricalmetric that represents a predicted tumor mutational burden. The tumormutational burden can refer to the total number of mutations (changes)found in the DNA of cancer cells. Knowing the tumor mutational burdenmay help plan the best treatment, and the tumor mutational burden hasbeen identified as a potential biomarker for immune checkpoint blockaderesponse.

D. Generating the Composite Biomarker Score

The immunogenomics-analysis system generates a composite biomarker scorederived from the set of genomic metrics and the set of transcriptomicmetrics and determines, based on the composite biomarker score, apredicted level of responsiveness of the subject to a particular type ofan immunotherapy treatment. For example, the composite biomarker scorecan be generated by using the transcriptomic metric corresponding to aneoantigen burden score, which can be adjusted based on the predictedtumor mutational burden identified from the genomic data. The compositebiomarker score can thus account for impairment to neoantigenpresentation and other established resistance markers. Integratingantigen presentation into the composite biomarker score may strengthenprediction levels associated with immune checkpoint blockade response.

While elevated measures of neoantigen burden may be predictive of whichsubjects will benefit from immunotherapy, the composite biomarker scorecan be derived based on genomic and transcriptomic metrics correspondingto additional resistance mechanisms arising from genetic variation inthe antigen presentation machinery, both at a germline as well assomatic level. These additional resistance mechanisms can furthermodulate immune response by diminishing capacity for neoantigenpresentation. Thus, the composite biomarker can use the metriccorresponding to neoantigen burden as a biomarker, but can furtherinclude genomic and transcriptomic metrics corresponding to additionaldata derived subsequent processing steps and longitudinal treatments, aswell as RNA expression levels.

In some instances, the composite biomarker score corresponds to anneoantigen burden score that is adjusted to account for subject specifictumor alterations that could further interfere with neoantigenpresentation, including HLA mutations, HLA loss of heterozygosity, andbeta-2-microglobulin mutations. As a result, analysis of subjects usingthe composite biomarker score can result in improved prediction oftherapy outcome, when compared to neoantigen and tumor mutational burdenindividually. A composite biomarker approach that models both biologicalmechanisms and impairment of neoantigen presentation can serve as astronger biomarker for immune checkpoint blockade therapy than many ofthe current biomarkers built around simpler biological models of tumorimmune response. Unlike tumor mutational burden-based approaches, thecomposite biomarker score can be generated by modeling broadermechanisms of neoantigen presentation.

Additionally or alternatively, a subset of somatic mutations associatedwith reduced response to immunotherapy (e.g., HLA class I and B2Mmutations, loss of heterozygosity in HLA class I genes) are weighted toadjust the composite biomarker score. By accounting for these escapemechanisms, the composite biomarker score can capture a fullerrepresentation of tumor antigen presentation to the immune system toincrease the predictive strength of this biomarker. The above approachcan produce more accurate results when applied to one or more specifictypes of cancers, such as non-small-cell lung carcinoma and squamouscell carcinoma of the head and neck subject cohorts, since HLA loss ofheterozygosity was identified as a prevalent escape mechanism thataffects cancer progression for those types. For example, tumor datarevealed allele-specific expression loss at frequencies above 45% inhead and neck, lung adenocarcinoma, pancreatic and prostate cancers. HLAloss of heterozygosity, combined with the prevalence of somaticmutations in class I HLA genes can be captured by the compositebiomarker score to identify damaging events to antigen presentingmachinery.

Thus, the composite biomarker score can integrate a broad set ofbiological features across multiple dimensions: exome and transcriptome,tumor and immune, response and resistance. The composite biomarker scorecan then be used for predicting immune checkpoint blockade response thatreflect the biological mechanisms driving response and resistance toimmunotherapies.

E. Treatment Selection

The composite biomarker score can serve as a strong predictor for immunecheckpoint blockade therapy response. As shown in the figures, thecomposite biomarker score achieved greater separation of immunecheckpoint blockade therapy responders and non-responders than tumormutational burden and other single analyte/gene, and expressionsignatures examined in the discovery cohort. The value of the compositebiomarker score for predicting responsiveness to particularimmunotherapies was further demonstrated by confirming these findings ina large independent validation cohort.

The composite biomarker score can further demonstrate that neoantigenscan guide immune response, promoting clinical response to immunotherapy.While only weak association was observed between response and tumormutational burden, stronger association between neoantigen burden andsubject response was apparent. It has been suggested that this findingmay be attributed to confounding effects of the distribution of melanomasubtypes within patient cohorts in various clinical studies, whichnegatively impact the predictive power of tumor mutational burden.However, such issues involving the cohorts did not appear to affectneoantigen burden. It is possible that the increased robustness ofneoantigen burden as a biomarker was achieved through the inclusion ofadditional data from subsequent processing steps, as well as RNAexpression levels, as this measure has been found to correlate withprotein representation in the MHC-bound peptide repertoire.

In some instances, additional factors influencing subject response areidentified outside of neoantigen burden. As an illustrative example,within the discovery cohort, non-responding outlier with the highestobserved composite biomarker score also includes a high impact, nonsensePD-1 mutation, which can be interpreted as likely preventing response toanti-PD1 therapy. The outlier, non-responding subject in the validationcohort with high composite biomarker score corresponds to a subject withmetastatic desmoplastic melanoma, which is associated with high levelsof mutational burden and distinct clinicopathologic and genetic featurescompared to typical cutaneous melanomas. Thus, using clinical responsedata with the composite biomarker score can identify a level ofheterogeneity of subject response to immunotherapies. Further, thecombination of clinical response data with the composite biomarker scorecan identify subsets of malignancies vulnerable to specific therapycombinations. Finally, the combination of clinical response data withthe composite biomarker score can identify other mechanisms of therapyresistance or response that extend beyond neoantigen presentation.

The composite biomarker score can thus be used to determine a treatmentmethod to prevent, arrest, reverse, or ammeliorate a disease. Thedisease may be a cancer. The composite biomarker score can indicate apredicted level of responsiveness of the subject. Accordingly, thecomposite biomarker score can be outputted as a be report thatidentifies, based on the predicted level of responsiveness of thesubject to the particular treatment: (i) a treatment recommendation ofthe particular treatment; (ii) a recommendation to administer theparticular treatment to the human subject; and/or (iii) a recommendationto not administer the particular treatment to the human subject. In someembodiments, the recommended treatment is administered to the humansubject.

Non-limiting examples of cancers include: acute lymphoblastic leukemia,acute myeloid leukemia, adrenocortical carcinoma, AIDS-related cancers,AIDS-related lymphoma, anal cancer, appendix cancer, trocytomas,neuroblastoma, basal cell carcinoma, bile duct cancer, bladder cancer,bone cancers, brain tumors, such as cerebellar astrocytoma, cerebralastrocytoma/malignant glioma, ependymoma, medulloblastoma,supratentorial primitive neuroectodermal tumors, visual pathway andhypothalamic glioma, breast cancer, bronchial adenomas, Burkittlymphoma, carcinoma of unknown primary origin, central nervous systemlymphoma, cerebellar astrocytoma, cervical cancer, childhood cancers,chronic lymphocytic leukemia, chronic myelogenous leukemia, chronicmyeloproliferative disorders, colon cancer, cutaneous T-cell lymphoma,desmoplastic small round cell tumor, endometrial cancer, ependymoma,esophageal cancer, Ewing s sarcoma, germ cell tumors, gallbladdercancer, gastric cancer, gastrointestinal carcinoid tumor,gastrointestinal stromal tumor, gliomas, hairy cell leukemia, head andneck cancer, heart cancer, hepatocellular (liver) cancer, Hodgkinlymphoma, Hypopharyngeal cancer, intraocular melanoma, islet cellcarcinoma, Kaposi sarcoma, kidney cancer, laryngeal cancer, lip and oralcavity cancer, liposarcoma, liver cancer, lung cancers, such asnon-small cell and small cell lung cancer, lymphomas, leukemias,macroglobulinem malignant fibrous histiocytoma of bone/osteosarcoma,medulloblastoma, melanomas, mesothelioma, metastatic squamous neckcancer with occult prima mouth cancer, multiple endocrine neoplasiasyndrome, myelodysplastic syndromes, myeloid leukemia, nasal cavity andparanasal sinus cancer, nasopharyngeal carcinoma, neuroblastoma,non-Hodgkin lymphoma, non-small cell lung cancer, oral cancer,oropharyngeal cancer, osteosarcoma/malignant fibrous histiocytoma ofbone, ovarian cancer, ovarian epithelial cancer, ovarian germ celltumor, pancreatic cancer, pancreatic cancer islet cell, paranasal sinusand nasal cavity cancer, parathyroid cancer, penile cancer, pharyngealcancer, pheochromocytoma, pineal astrocytoma, pineal germinoma,pituitary adenoma, pleuropulmonary blastoma, plasma cell neoplasia,primary central nervous system lymphoma, prostate cancer, rectal cancer,renal cell carcinoma, renal pelvis and ureter transitional cell cancer,retinoblastoma, rhabdomyosarcoma, salivary gland cancer, sarcomas, skincancers, skin carcinoma merkel cell, small intestine cancer, soft tissuesarcoma, squamous cell carcinoma, stomach cancer, T-cell lymphoma,throat cancer, thymoma, thymic carcinoma, thyroid cancer, trophoblastictumor (gestational), cancers of unknown primary site, urethral cancer,uterine sarcoma, vaginal cancer, vulvar cancer, Waldenstr6mmacroglobulinem and Wilms tumor. Examples of diseases or conditions inwhich an integrative, composite biomarker can be employed includehematological malignancies, solid tumor malignancies, metastatic cancer,and benign tumors.

A plurality of subjects afflicted with cancers can benefit from the useof an integrative, composite biomarker. Subjects can be humans,non-human primates such as chimpanzees, and other apes and monkeyspecies; farm animals such as cattle, horses, sheep, goats, swine;domestic animals such as rabbits, dogs, and cats; laboratory animalsincluding rodents, such as rats, mice and guinea pigs, and the like. Asubject can be of any age. Subjects can be, for example, elderly adults,adults, adolescents, pre-adolescents, children, toddlers, infants.

Patient health or treatment options may be assessed by providing abodily fluid or tissue sample from a subject; collecting a genomic andproteomic profile from the bodily fluid or tissue sample and comparingthe genomic and proteomic profiles to at least one reference profile toassess the health of the subject. The reference profile may profile atleast one of: one or more disease, injury or disorder. The referenceprofile may be established from the genomic or proteomic profilecollected from subjects with the same disease, from a healthypopulation, or both. The method may comprise monitoring by repeatedlycomparing, over time, the genomic or proteomic profile to the referenceprofile. Aspects of the present disclosure may comprise statisticallyanalyzing differences between a tumor profile and reference profile toidentify at least one biomarker. Biomarkers or a group of biomarkershaving a significance level of less than 95%, 97% 98% or 99% may berejected.

In some aspects, the present disclosure may provide a method of adaptiveimmunotherapy for the treatment of cancer in a subject comprisingadministering a first course of a first immunotherapy compound to thesubject; acquiring comprehensive tumor and immune related molecularinformation relating to additional emerging and investigationalbiomarkers such as neoantigen burden, HLA genotype diversity, I A lossof heterozygosity, immune repertoire profiles, immuno-cellulardeconvolution, oncoviruses, and more, wherein the second course ofimmunotherapy comprises a second immunotherapy compound if the tumor andimmune related molecular profile is indicative of an insufficientresponse to the first immunotherapy compound; or a second course of thefirst immunotherapy compound if the tumor and immune related molecularprofile is not indicative of an insufficient response to the firstimmunotherapy compound. One or more biological samples acquired afteradministering a first dose of a first course of a first immunotherapycompound may be acquired on the same day that a subsequent dose of thefirst course of a first immunotherapy compound may be administered.

Treatment, testing, or analysis may be provided to the subject beforeclinical onset of disease. Treatment, testing, or analysis may beprovided to the subject after clinical onset of disease. Treatment,testing, or analysis may be provided to the subject after Iday, Iweek, 6months, 12 months, or 2 years after clinical onset of the disease.Treatment, testing, or analysis may be provided to the subject for morethan Iday, Iweek, Imonth, 6 months, 12 months, 2 years or more afterclinical onset of disease. Treatment, testing, or analysis may beprovided to the subject for less than Iday, I week, Imonth, 6 months, 12months, or 2 years after clinical onset of the disease. Treatment,testing, or analysis may also include treating, testing, or analyzing ahuman in a clinical trial.

VI. Experimental Results for Generating a Composite Biomarker Score

To demonstrate the effectiveness of the composite biomarker score inpredicting immune-system response to immunotherapies, the followingexperiment was conducted. Paired pretreatment formalin-fixedparaffin-embedded tumor and normal blood samples was collected andprofiled to produce comprehensive tumor mutation information, geneexpression quantification, neoantigen characterization, HLA (typing,mutation, and loss of heterozygosity), TCR repertoire profiling, andtumor microenvironment profiling. These data were then compartedtogether with clinical outcome, at which the composite neoantigen scorewas computed for each subject along with additional biomarkers such astumor mutational burden.

A. Cohort Population

1. Discovery Cohort

With respect to the study population, 51 subjects with unresectable,stage III/IV melanoma who underwent treatment were enrolledretrospectively without randomization or blinding. Subjects were treatedwith either nivolumab (480 mg IV every 4 weeks or 240 mg IV every 2weeks), a combination of nivolumab and ipilimumab (1 mg/kg IV and 3mg/kg IV, respectively, every 3 weeks), or pembrolizumab (200 mg IVevery 3 weeks). Solid tumor and blood samples were collected withinthree months prior to treatment start. Computed tomographic scans wereperformed 10-12 weeks after treatment start, with follow-up scans everythree months. Responders were defined as complete response (CR) orpartial response (PR). Non-responders were defined as stable disease(SD) or progressive disease (PD).

2. Validation Cohort

Replication of the predicted results was conducted using publiclyavailable NGS data collected from advanced melanoma subjects whounderwent immune checkpoint blockade therapy. Whole exome and RNAsequencing data from this study were obtained from dbGaP (NCBI databaseof Genotypes and Phenotypes, RRID:SCR_002709) (study accession:phs000452.v3.p1). Subjects with mixed responses to therapy (n=2) and lowpurity tumors (n=7) were excluded from the analysis, leaving (n=110)evaluable subjects for validation. Clinical characteristics for thevalidation cohort are provided in the original study.

3. Statistical Analysis for Clinical Data

With respect to generating clinical data, the Kaplan—Meier method wasused to estimate progression-free survival (PFS) and overall survival(OS). Objective response rate was reported as proportion along withClopper—Pearson exact CIs. Fisher's exact test and chi-square test wereused to test for associations between groups, and categorical variables.When considering the variance between more than two groups, theKruskal-Wallis H test was used. The Wilcoxon Mann—Whitney rank sum test(MWW) was used for numeric pairwise comparisons. Benjamini-Hochbergcorrection was used to adjust P values as listed. The Kolmogorov-Smirnov(KS) statistic was used for RNA pathway analyses. Correlations betweencontinuous variables were determined using Kendall's tau. Predictivemodels were generated using logistic regression, and AUROC used todetermine ability to differentiate between response and non-responseaccording to published methods (28). All tests were two-sided; FDRvalues of <0.1 for pathway analyses, and P-values of <0.05 for all othertests were considered statistically significant. The following tableprovides

B. Cohort Clinical Data

1. Clinical Characteristics Corresponding to Samples of the DiscoveryCohort

For the 51 unresectable melanoma subjects in the cohort treated withimmune checkpoint blockade, median follow-up time period correspondingto the cohort was 24 months after treatment, with 33 out of 51 subjects(50%, 95% Clopper—Pearson confidence interval of 50-78%) presenting anobjective response at first evaluation by Response Evaluation Criteriain Solid Tumors (RECIST) 1.1. Within the clinical cohort, tumorsoriginated in the head and neck region (31%), trunk (31%), extremities(25%), acral areas (6%), mucosa (4%), and 2% from occult regions. Inaddition to these data, sex, age and other subject-specific demographicsinformation is presented. The following table provides a summary ofvarious characteristics associated with subject of the clinical cohort:

TABLE 1 Discovery Cohort Characteristics n Responder (33) Non-responder(18) P-value Age at treatment 68(61-81)  66.5(56.5-76)  0.4536 Diseaseorigin 0.9901 Acral 2(6.1%) 1(5.6%)  Extremity  8(24.2%) 5(27.8%) Headneck 11(33.3%) 5(27.8%) 0.926 Mucosal 1(3%)  1(5.6%)  Trunk 1(3%)  0%)PD1 therapy Nivolumab 10(30.3%) 6(33.3%) 0.8976 Nivolumab (in  8(24.2%)6(33.3%) 0.7137 combination with ipilimumab) Pembrolizumab  5(15.2%)2(11.1%) Sex Female 1(3%)  (0%)  Male 19(57.6%) 10(55.6%)  0.7513 Stageat treatment Unresectable Ill  9(27.3%) 6(33.3%) 0.8947 M1a 24(72.7%)12(66.7%)  0.8947 M1b 1(3%)  1(5.6%)  0.2068 M1c  5(15.2%) 0%) 0.2127

As shown in Table 1, there was no statistically significant differencein objective response rates between sites of disease origin. Further, 11subjects (22%) had progressed following prior treatment with acheckpoint inhibitor, whereas 40 (78%) were naive to immune checkpointblockade. Subjects were administered either pembrolizumab (n=29, 57%),nivolumab (n=15, 29%), or a combination of nivolumab and ipilimumab(n=7, 14%).

2. Genomic Data Corresponding to Samples of the Discovery Cohort

Mutations associated with responding and non-responding tumors wereinvestigated, revealing no significant single-gene predictors ofresponse following multiple hypothesis correction (subject-levelmutation data). The following table lists log 2 and p-values thatprovides comparison data between responders and non-responders for eachidentified gene in the clinical cohort:

TABLE 2 Gene log2 fold chang adj p value C10RF204 −1.61163 0.018108CRHRl-ITl −1.07242 0.03795 CTD-2201118 −1.60283 0.018993 CXORF22−2.36416 0.026933 DLL3 3.281187 0.000492 DRD4 −1.47017 0.002091 JMJD60.581033 0.018885 IL12RB2 1.805639 0.010296 NEFH 2.332242 0.040698KCNKlO −1.65424 0.043177 VSIGl −1.68421 0.009372 GLRA2 4.046629 0.002194SALLl −2.69747 0.009372 CEMIP −1.82324 0.00129 RSPH6A −1.501 0.030373LRP2BP −0.7731 0.025208 MGP 2.455289 0.003771 FGF12 −2.42734 0.000436APC2 −1.94058 0.009374 GBPl 1.433309 0.046076 TNNT2 −2.13612 0.018993IRFl 1.215632 0.026263 AMOT 1.175707 0.00657 RASLllB −1.60126 0.034151FCHOl −1.19714 0.030651 1DO1 1.939955 0.032344 GUCY2D −1.26331 0.048385ZNF767P −1.72851 0.030087 CATSPERB −1.81093 0.021066 TIC9 −1.579210.024555 CRBl 2.09183 0.043581 BAAT 3.01605 0.015062 Pl15 2.5340180.016874 ARHGAP20 1.468957 0.037466 CXCL9 2.198974 0.002194 LMANlL−1.63068 0.030651 MYLK3 −1.64958 0.02731 ZNF385B −1.99676 0.02731STXBP5L −3.24565 0.000812 RNF175 2.592116 0.015062 CRB2 −1.985090.016586 GLB1L2 −1.76366 0.035468 TAOK2 −0.84718 0.048927 GPM6A −3.03520.001369 ASTNl −2.0683 0.016683 GDPDl 1.373747 0.003771 CDH12 −2.82870.009372 EPHBl −1.86788 0.010296 PPMlJ −1.34045 0.006808 PAQR6 −0.972530.042537 SHANKl −1.607 0.030373 FGFll −1.54424 0.014214 ANKRDSS −1.798740.037442 SMC02 −1.66042 0.030505 ELFN2 2.718762 0.009374 CXCLlO 1.875920.00657 CXCLll 2.002743 0.005084 KRT86 3.173565 0.003247 KRT72 −3.62120.000436 ZNF540 −1.09009 0.0358 AQP4 −4.01248 0.002491 MBOATl 0.6998320.032344 SUSD5 2.027636 0.030651 SLC16A13 −0.93716 0.048528 DOK7 −1.89850.026074 ZNF575 −0.68771 0.036673 DSCAMLl −2.13465 0.024377 CITED4−2.07691 0.005867 CSNKlAlL −1.20759 0.037437 SYNE4 −1.24594 0.034861SAGEl 4.264828 0.009372 IZUMOl −1.307 0.027117 ZDHHC23 −1.45714 0.032344LCN12 −1.58892 0.005125 KRT73 −3.12158 0.001387 NANOS3 −1.66023 0.030373ANKRD19P −2.57316 0.000492 SLC38A3 −2.25653 0.036738 HBA2 −1.706920.017059 ZNF560 −2.87698 0.009372 CACNAl E 1.832572 0.041874 SNORA60−1.39145 0.015062 HSPAlL −1.07768 0.018366 APOM −1.02226 0.026263 KRT813.180689 0.009372 IGLL3P −3.76388 0.002117 EGFEMlP 2.988888 0.002117HBAl −1.65165 0.021066 UBD 1.715684 0.021069 SNORA7B −1.32328 0.027766SNORA68 −1.46923 0.041874 ARHGEF35 −2.07861 0.02636 PPMlN −1.006870.048927 LTB4R2 −1.42895 0.037079 TIC3Pl −0.89718 0.030087 SNORA77−1.30542 0.035468 LINC00115 −0.71744 0.048482 TMEM238 −1.58165 0.02258UGT1A5 −2.27141 0.003063 UGT1A3 −2.53733 0.000679 UPK3B −1.235830.018108 LCE3C −5.31202 0.026263 HBB −1.97396 0.002194 ANP32C −1.253610.015062 TGFBR3L −1.68206 0.018807 MIR5690 −1.82684 0.009372 C2orf15−0.98529 0.04459 NEFL −2.51474 0.015062 FLJ39080 2.796046 0.021066 LMTK3−1.25002 0.012716 LOC100131691 −1.45461 0.042537 LOC100289580 −1.292380.027375 LOC100506388 −1.53284 0.030651 LOC100652768 −1.05443 0.024471LOC100653133 −1.9145 0.018993 LOC101927372 −1.30701 0.041557LOC101928179 −1.66223 0.030505 LOC101928457 −5.60655 0.000436LOC101928577 −1.3549 0.009372 LOC399900 −0.85131 0.043177 TCLlA −1.879220.032344

3. Immune Pathway Data Corresponding to Subjects of the Cohort

Next, genetically disrupted pathways corresponding to the clinical datawere determined. The most frequently disrupted pathways included RTK-RASand WNT pathways (disrupted in 73% and 51% of our cohort, respectively).Mutations were detected throughout the RTK-RAS pathway. Numerous RTKswere mutated including ROS1 and ERBB4, RAS family genes including NRAS,BRAF, and MAPK1 and 2.

FIGS. 2A-B show statistical data corresponding to oncogenic changes ingenomic and transcriptomic data corresponding to subjects of a clinicalcohort. FIG. 2A shows mutations in known oncogenic pathways in latestage melanoma subjects. In FIG. 2A, fraction of pathway affecteddenotes the number of genes mutated within the pathway (n=51 samplesincluded in this analysis). FIG. 2B shows visualization of mutationsoccurring within the RTK-RAS pathway. Tumor suppressor genes are listedin red, and oncogenes are shown in blue. Dots represent absence ofmutation within the specified gene. Each column represents a tumor, withgreen blocks representing variants within a given gene.

C. Transcriptomic Metrics

1. Differential Gene Expressions

Transcriptomic data was generated for each subject in the discoverycohort. From the transcriptomic data, various transcriptomic metricswere generated. For example, 121 differentially expressed genes wereidentified in responding subjects (n=48 evaluable subjects; adjustedP-value ≤0.05, log fold change >2 or <−0.5). FIGS. 3A-C show statisticaldata corresponding to transcriptomic metrics that identifydifferentially expressed genes that are associated with immune-systemresponse. FIG. 3A shows 50 genes with highest levels differentialexpressions in the cohort, in which fold change has been provided tocompare responding subjects to non-responding subjects. FIG. 3A furthershows Benjamini-Hochberg corrected P values below 0.05 for each gene ofa corresponding set of 48 genes. Although all are not shown in FIG. 3A,enrichment was observed in 29 of these genes, while reduced expressionwas observed in 92 genes. To illustrate, FIG. 3B shows a heatmap ofdifferentially expressed genes for each subject of the clinical cohort.In FIG. 3B, each column represents a subject, and each row represents agene.

Among the most strongly upregulated genes (log 2 fold change=3.28; FDRadjusted P=0.0005) included delta-like ligand 3 (DLL3), which is aninhibitory Notch ligand that exhibits high expression in small cell lungcancer and other tumors tissues. Because of its low cytoplasmicexpression in normal tissue compared to elevated, homogeneous cellsurface expression in tumors, the delta-like ligand 3 gene is currentlyunder investigation as a possible therapeutic target. Additionally, fourmembers of the keratin (KRT) family (KRT72, 73, 81, 86), which is a genegroup identified to have extensive ties to cancer development, hadaltered expression levels when comparing responders and non-responders.Validation of gene expression analysis results for DLL3 and KRT familygenes confirmed significance of DLL3 (MWW P=0.02), but not KRT72, 73,81, 86 (MWW P=0.44, P=0.41, P=0.6 and P=0.17). Such difference invalidation results can be possibly due to reduced sensitivity ofdetermining differential expression in individual genes.

Though not significantly enriched at a cohort level, IDO1 expression wasdetected at very high levels in three subjects (median IDO1 TPM=10.36;outlier IDO1 TPM=1955, 661, and 451). To illustrate, FIG. 3C shows a setof box plots that compare IDO1 gene expression levels of responsivesubjects and those of non-responsive subjects. The gene expressionvalues were provided in units of Transcripts Per Kilobase Million. Forthe group of responsive subjects, three outlier subjects wereidentified. Although the IDO1 expression values did not appear to have arelationship with response to immunotherapies, these outliers withelevated levels of expression may indicate escape mechanisms thatprevented complete response in corresponding subjects (n=48). Forexample, two of the subjects overexpressing IDO1 may have failed toachieve complete response to immunotherapies, possibly due to anIDO1-driven immunosuppressive environment.

2. Gene Enrichment Analysis

Next, gene set enrichment analysis was performed to identifydifferentially regulated pathways in the clinical cohorts. FIG. 4 showsstatistical data corresponding to a normalized enrichment score for eachdifferentially regulated immune pathway, in which the normalizedenrichment scores are generated based on a gene-set enrichment analysis.In FIG. 4 , significant enrichment of pathways related to immunefunction were identified among responsive subjects with up-regulatedgenes. Benjamini-Hochberg corrected P values below 0.05 are shown.Inflammatory signaling cascades were amongst the most highly enriched ofthose profiled (significance set as FDR<0.1). Activation of immunepathways likely have been resulted from other enriched pathways. Forexample, cellular differentiation of Th17 can be driven by: (i) thecytokine TGF-β, which induces RORγt in Th17 cells; and (ii) IL-6, whichinduces the Th17 lineage. The observed enrichment of Th17 may also bepositively regulated by the observed increase in STAT3 signaling, whichserves to promote Th17 differentiation.

3. Expression Levels of T-Cell Receptors

FIGS. 5A-C show statistical data corresponding to transcriptomic metricsthat identify expression levels of T-cell receptors. The adaptive immunesystem can respond to a broad array of antigens due to its largerepertoire of unique T-cell receptors (TCRs). The box plots in FIGS.5A-B C cover the interquartile range from 25th percentile at their lowerbound to the 75th percentile at their upper bound, with median indicatedby a horizontal line. The upper whisker includes the largest valuewithin 1.5× interquartile range above the 75th percentile. The lowerwhisker includes the smallest value within 1.5× interquartile rangebelow 25th percentile. In order to characterize the pretreatmenttumor-immune landscape, TCR-β repertoire diversity was identified in asubset of subjects in the cohort (n=28 subjects). Clonality wasdetermined for the clonal abundance of all productive TCR-β sequencesusing 1-Pielou's evenness. As intra-tumoral heterogeneity is consideredto be a determinant of immune response, mutant-allele tumorheterogeneity (MATH) scores, which indicate an estimated level of tumorheterogeneity, were compared with the identified TCR-β clonality. FIG.5A shows a set of box plots that identify a comparison of TCR-βclonality between low and high mutant-allele tumor heterogeneity levels.As shown in FIG. 5A, a significant association (MWW, P=0.014) wasidentified between high tumor heterogeneity and clonal diversity of theTCR-β repertoire.

FIG. 5B shows a set of box plots that identify a comparison of TCR-βclonality between a first group of subjects identified as beingresponsive to immunotherapies and a second group of subjects identifiedas being non-responsive to the immunotherapies. As shown in FIG. 5B,TCR-β clonality is elevated in responding subjects, compared tononresponders (n=28; MWW; P=0.047). Thus, TCR-β clonality can beconsidered to be significantly associated with therapy outcome. Further,FIG. 5C shows a line plot that identifies a comparison ofprogression-free survival probability between a first group identifiedto have high TCR-β clonality and a second group identified to have lowTCR-β clonality. FIG. 5C shows that significantly longer progressionfree survival was observed in high clonality subjects when compared tothose with low clonality (two-sided KM log-rank test, P=0.0043), inwhich high/low stratification was calculated independently for old/youngpopulations (median cohort age used as cut point).

4. Immune Infiltrate Signatures

Characterization of immune and stromal cell populations within the tumormicroenvironment (tumor microenvironment) in the cohort was implemented.The generated data were used to produce semi-quantitative immuneinfiltration scores. FIG. 6 shows a set of box plots that identify acomparison of enrichment scores between a first group of responsivesubjects and a second group of non-responsive subjects. The comparisonof enrichment scores was identified across various types of tumorinfiltrating lymphocytes, including regulatory T-cell (TREG), naturalkiller cell (NK cell), and cancer associated fibroblast (CAF). As shownin FIG. 6 , responding and non-responding subjects largely sharedsimilar distributions of immune cell expression across various types.Thus, the gene expression levels of immune cell, in isolation, do notappear to be strong predictive indicator of responsiveness levels toimmunotherapies. However, as described herein, the expression levels canbe a contributing factor in generating the composite biomarker scorethat accurately predicts responsiveness to the immunotherapies.

5. Neoantigen Burden

A neoantigen-based biomarker approach achieves a strong correlation withresponse to immune checkpoint blockade. With respect to this particularexemplary experiment, two different neoantigen models were generated,such that their respective performance levels were compared. A firstneoantigen model corresponded to a score based on neoantigen burdenonly, and a second neoantigen model corresponded to the first model thatwas extended to account for impairment to neoantigen presentation andother established resistance markers. The second neoantigen model thuscorresponded to a model for generating the composite biomarker score.

To calculate the neoantigen burden score, features derived from exome-and transcriptomic data were used. Putative neoepitopes were predictedfrom single-nucleotide variants, indels, and fusions detected from bothexome and transcriptome sequencing. To improve MHC class I neoantigenprediction, mass spectrometry-based peptide binding data frommono-allelic HLA transfected cell lines was generated. This data wasused to train an improved machine learning algorithm which integratesHLA binding, proteasomal cleavage, and gene expression information toimprove neoantigen prediction.

FIGS. 7A-B show statistical data corresponding to transcriptomic metricsthat identify neoantigen burden across various genes and disease sites.FIG. 7A shows a set of box plots that identify neoantigen burden scorescorresponding to driver mutations corresponding to BRAF, NRAS, NF1, andWT genes. FIG. 7A shows that neoantigen burden varied significantlybetween tumors harboring different driver mutations, revealingsignificant variation amongst subtypes (Kruskal—Wallis, P=1e-04).

In addition, FIG. 7B shows a set of box plots that identify neoantigenburden scores corresponding to various disease sites of melanoma,including acral, extremity, head/neck, mucosal, trunk, and occultregions. In FIG. 7B, a significant association across disease sites oforigin was not detected (Kruskal—Wallis, P=0.08). Thus, neoantigenburden did not vary globally when comparing tumors arising fromdifferent sites of origin, although it can observed that post hoccomparison between acral and trunk melanomas did reveal significantvariation (MWW; P=0.047).

FIGS. 8A-F show statistical data identifying neoantigen burden scoresacross various subjects, in which the neoantigen burden score can bepredictive of responsiveness of subjects treated with immunotherapies.FIG. 8A shows a set of box plots corresponding to a comparison ofneoantigen burden scores between a first group of subjects thatresponded to immunotherapies and a second group of subject that did notrespond to the immunotherapies. In FIG. 8A, each boxplot covers theinterquartile range (interquartile range) from 25th percentile at itslower bound to the 75th percentile at its upper bound, with medianindicated by a horizontal line. The upper whisker includes the largestvalue within 1.5× interquartile range above the 75th percentile. Thelower whisker includes the smallest value within 1.5× interquartilerange below 25th percentile. It was found that neoantigen burden issignificantly higher in responding subjects compared to nonrespondingsubjects (n=51; MWW; P=0.016). FIG. 8B shows a set of box plotscorresponding to a comparison of neoantigen burden scores of subjectgroups in the validation cohort (e.g., responsive subjects,non-responsive subjects). The data from the validation cohort in FIG. 8Bconfirms that subjects who responded to therapy presenting significantlyhigher neoantigen burden (MWW; P=0.021).

Other types of experimental data also indicate that higher neoantigenburden score is associated with responsiveness to immunotherapies. FIG.8C shows a line plot that identifies a comparison of progression-freesurvival probability between a first group identified to have highneoantigen burden and a second group identified to have low neoantigenburden. As shown in FIG. 8C, significantly longer progression-freesurvival was observed in subjects with high neoantigen burden whencompared to those with low neoantigen burden (two-sided KM log-ranktest; P=0.002). FIG. 8D shows a line plot line plot that identifies acomparison of progression-free survival probability between subjectgroups in the validation cohort, and FIG. 8E shows a line plot thatidentifies a comparison of overall survival rate between subject groupsin the validation cohort. Although FIG. 8D shows that progression-freesurvival of subject with high neoantigen burden was not significantlylonger than those with low neoantigen burden in the validation cohort(two-sided KM log-rank test, P=0.085), FIG. 8E shows marked improvementsto overall survival were observed in subjects with high neoantigenburden (two-sided KM log-rank test, P=0.005).

FIG. 8F shows a receiver operating characteristic curve that identifiesperformance levels of the neoantigen burden score model. As shown inFIG. 8F, the area under curve value for the neoantigen burden scoremodel was 0.71 and the cross-validation area under curve value (mean)was 0.69 (log-likelihood ratio P=0.0329).

D. Genomic Metrics

1. Mutation Characteristics

In addition to the transcriptomic data, genomic data was generated foreach subject in the discovery cohort. From the genomic data, variousgenomic metrics were generated. FIG. 9A-F show statistical data thatidentify one or more characteristics relating to mutations present ineach subject sample of the discovery cohort. FIG. 9A shows identifiesmutations in various genes of subjects receiving anti-PD-1 therapy. InFIG. 9A, top box plot represents mutational load. Tiled plot showsmutated genes (rows) by sample (columns), with tile color indicatingmutation type. The box plot to the right represents the number ofsubjects with mutations in the specified gene, colored to indicatemutation type. Under the tiled plot, the first line representstherapeutic response, as either response (partial or complete response;dark green; n=33), or non-response (black; n=18).

In FIG. 9A, median nonsynonymous tumor mutational burden was 4.07mutations/MB (interquartile range, 0.95-12.455). This genomic metricappears to be consistent with values observed in known datasets. Forexample, FIG. 9B shows an amount of mutations identified in each sampleacross various datasets. Levels of mutational burden in the discoverycohort are comparable to those in TCGA-SKCM dataset (melanoma). In FIG.9B, each dot represents a sample, with red horizontal lines at themedian numbers of mutations in each cancer type. The (log scaled)vertical axis shows the number of mutations per sample.

FIG. 9C shows a set of box plots that identify an amount of mutationsfor each type of single-nucleotide variants and a bar graph showing adistribution of types of single-nucleotide variants for each subject inthe discovery cohort. In FIG. 9C, single-nucleotide variants wereclassified as either transitions or transversions (n=49). Left boxplotshows overall distribution of six different substitution types, whileright boxplot shows distribution of transitions (Ti) and transversions(Tv). As shown in FIG. 9C, C>T transitions appear to form the bulk ofidentified single-nucleotide variants (76%).

FIG. 9D shows a bar graph identifying a distribution of three mutationalsignatures for each subject in the discovery cohort. Signatures wereextracted by decomposing a matrix of nucleotide substitutions,classified into 96 substitution classes based on bases immediatelysurrounding the mutated base, resulting in three primary signatureswithin the cohort. According to FIG. 9D, the most commonly identifieddriver mutation occurred in BRAF, in 33% of subjects, followed by 20%NRAS and 16% NF1 in the population. FIG. 9E shows a distribution ofmutations of the three primary signatures. Extracted signatures werecompared to previously validated signatures. Signature 1 and 2 in thediscovery cohort are most similar to a UV signature, while the thirdsignature most closely associated with a signature of unknown etiology.As shown in FIG. 9E, mutational signatures found in the discovery cohortmost strongly associate with UV-induced DNA damage.

FIG. 9F shows a bar graph that identifies, for each driver mutationassociated with a particular tumor (e.g., BRAF, NRAS), a distribution ofsubjects corresponding to various levels of responsiveness toimmunotherapies. For example, responders were defined as completeresponse (CR) or partial response (PR). Non-responders were defined asstable disease (SD) or progressive disease (PD). Driver mutation canrefer to a gene alteration that gives cancer cells a fundamental growthadvantage for its neoplastic transformation. In FIG. 9F, subjectsharboring BRAF mutated tumors were more likely to positively respond totherapy (n=47; Exact binomial test; P=0.0258). Response rate for thedifferent genomic subtypes did not significantly vary from the expectedresponse rate. The elevated number of progressive disease for WT gene insubjects likely arises from the reduced frequency of BRAF, which aretypically observed at higher rates.

2. Tumor Mutational Burden

FIG. 10 shows sets of box plots 1000 that identify tumor mutationalburden across various driver mutations, disease sites, and subjectgroups. The box plots 1000 includes boxplots 1002, 1004, and 1006. Thebox plots cover the interquartile range (interquartile range) from 25thpercentile at their lower bound to the 75th percentile at their upperbound, with median indicated by a horizontal line. The upper whiskerincludes the largest value within 1.5× interquartile range above the75th percentile. The lower whisker includes the smallest value within1.5× interquartile range below 25th percentile. The values correspondingto the tumor mutational burden were plotted on log 10 scale.

The box plots 1002 identify tumor mutational burden for each drivermutation. Tumor mutational burden varied significantly between tumorsharboring different driver mutations (Kruskal—Wallis, P=0.00012). Thebox plots 1004 identify tumor mutational burden for each of theidentified sites of disease origin for melanoma. The box plots 1004 showsignificant global variation of tumor mutational burden across differentsites of disease origin, with significant variation found in comparisonwith melanomas originating in the head and neck (Kruskal—Wallis,P=0.016).

The box plots 1006 identify tumor mutational burden for a first group ofsubjects that responded to immunotherapy and a second group of subjectsthat did not respond to the immunotherapy. The comparison of tumormutational burden in responding vs non-responding subjects revealedsignificant associations (MMW; P=0.049). However, the relatively smallvariance between tumor mutational burden in responding andnon-responding subjects in this cohort could be due to the confoundingeffects of melanoma subtype, and varying tumor purity, as these measureshave recently been shown to limit tumor mutational burden'seffectiveness as a predictive biomarker. Thus, tumor mutational burdenalone may not be able to accurately predict responsiveness toimmunotherapies.

E. Composite Biomarker Score

As described herein, embodiments of the present disclosure recognizethat alterations in the antigen presenting machinery that couldinterfere with neoantigen presentation. Taking into such data couldimprove the performance of predicting responsiveness to immunotherapies,as these alterations have been noted individually to impact subjectresponse to immune checkpoint blockade. Accordingly, the compositebiomarker score adjusts the neoantigen burden score to account forsubject specific tumor alterations that could interfere with neoantigenpresentation, including HLA mutations, HLA loss of heterozygosity, andB2M mutations.

1. Discovery Cohort

FIGS. 11A-D show statistical data identifying composite biomarker scoresacross various subjects, in which the composite biomarker scoresindicate improved performance in predicting responsiveness of subjectstreated with immunotherapies. In particular, FIGS. 11A-D show thatcomposite biomarker score is more strongly associated with response toimmunotherapies than neoantigen burden alone. For example, FIG. 11Ashows a set of box plots corresponding to a comparison of compositebiomarker scores between a first group of subjects that responded toimmunotherapies and a second group of subject that did not respond tothe immunotherapies. As shown in FIG. 11A, the composite biomarker scoreis significantly higher in responding subjects compared tonon-responding subjects (n=51; MWW; P=0.002). Thus, the compositebiomarker score resulted in improved prediction of therapy outcome, whencompared to neoantigen burden. FIG. 11B shows a set of box plotscorresponding to a comparison of composite biomarker scores of subjectgroups in the validation cohort (e.g., responsive subjects,non-responsive subjects). The data from the validation cohort in FIG.11B confirms a similar result, in which subjects in the responsive grouppresenting significantly higher composite biomarker score than thenon-responding subjects (n=110; MWW; P=0.010). With reference to FIGS.11A-B, the corresponding box plots cover the interquartile range(interquartile range) from 25th percentile at its lower bound to the75th percentile at its upper bound, with median indicated by ahorizontal line. The upper whisker includes the largest value within1.5× interquartile range above the 75th percentile. The lower whiskerincludes the smallest value within 1.5× interquartile range below 25thpercentile.

FIG. 11C shows a line plot that identifies a comparison ofprogression-free survival probability between a first group identifiedto have high composite biomarker scores and a second group identified tohave low composite biomarker scores. As shown in FIG. 11C, significantlylonger progress-free survival was observed in subjects with highcomposite biomarker score when compared to those with low compositebiomarker score (two-sided KM log-rank test; P=0.0016).

FIG. 11D shows a receiver operating characteristic curve that identifiesperformance levels of the composite biomarker score model. As shown inFIG. 11D, the composite biomarker model performs better than theneoantigen burden model: area under curve for the composite biomarkerscore increased to 0.76 from 0.71 and the cross-validation area undercurve (mean) increased to 0.75 from 0.69 (log-likelihood ratioP=0.0057).

2. Validation Cohort

FIGS. 12A-B show statistical data identifying composite biomarker scoresacross various subjects, in which the composite biomarker scoresindicate improved performance in predicting progression-free and overallsurvival rates of subjects in the cohort. In particular, the improvementof performance levels of the composite biomarker score was morenoticeable in the validation cohort. FIG. 12A shows a line plot lineplot that identifies a comparison of progression-free survivalprobability between subject groups in the validation cohort, and FIG.12B shows a line plot that identifies a comparison of overall survivalrate between subject groups in the validation cohort. In contrast towhat was found for neoantigen burden score in the validation cohort,FIG. 12A shows that progression-free survival of subjects associatedwith high composite biomarker scores was significantly longer thansubjects associated with low composite biomarker score (two-sided KMlog-rank test, P=0.05). As also shown in FIG. 12B, greater significancewas also achieved when analyzing overall survival, in which the overallsurvival rate was significantly longer in subjects associated with highcomposite biomarker score (two-sided KM log-rank test, P=0.002). Theimprovement with the composite biomarker score can be understoodbiologically with the finding that 23.5% of subjects in the discoverycohort, and 17.27% of subjects in the validation cohort had at least onemechanism potentially affecting antigen presentation, suggesting thesefeatures may frequently influence immune-system response toimmunotherapies.

3. Mutations in HLA Genes Affecting the Composite Biomarker Score

FIG. 13A-B show statistical data that identify somatic mutations to HLAgenes that may contribute to a decreased probability of neoantigenpresentation. In particular, a review of damaging HLA mutations acrossthe discovery cohort revealed deleterious variants in many subjects. Forexample, FIG. 13A shows examples of somatic variants identified insamples of the discovery cohort. As shown in FIG. 13A, two distinctsomatic HLA mutations were found in in subject 25, including a stop gainmutation in HLA-A02:01 and a splice region variant in HLA-B15:01 (allelefraction=0.473 and 0.368, respectively). These somatic mutations canlead to the loss of surface expression of HLA-A02:01 and possiblemisfolding of HLAB15:01. A damaging frameshift variant was detected inbeta-2-microglobulin (B2M) in subject 38, possibly impairing all MHCclass I presentation in that subject.

FIG. 13B shows a bar graph that identifies relative frequencies ofneoantigens that are presented by respective HLA genes for subject 25 ofthe discovery cohort. In FIG. 13B, 38.9% of neoantigens (19.1% forA02:01; 19.8% for B15:01) in subject 25 were predicted to bind to thedamaged HLA alleles, suggesting potentially severe impairment ofneoantigen presentation. Of note, subject 25 was an outlier in thenon-responding subjects, with much higher neoantigen burden, suggestingimpaired neoantigen presentation beyond that which is captured in thecomposite biomarker score may be a contributing factor to immunecheckpoint blockade resistance. In another outlier subject 38 (highneoantigen burden, non-responder), a damaging frameshift variant wasdetected in B2M at a high allelic fraction, also potentially impactingantigen presentation.

4. HLA Loss of Heterozygosity

HLA loss of heterozygosity was also examined in this cohort, as it canalso impact neoantigen presentation. HLA loss of heterozygosity refersto an acquired resistance mechanism that facilitates immune escape byreducing capacity for presentation of tumor neoantigens to the immunesystem. As the process of HLA loss is governed by selective pressureswithin the tumor microenvironment, particularly at later stages of tumorevolution, it was hypothesized that within the cohort of late-stagemelanoma subjects allele-specific HLA loss of heterozygosity couldcontribute to reduced therapeutic response despite apparent elevatedneoantigen burden.

It was found that HLA loss of heterozygosity was the most prevalent formof HLA disruption, occurring in 19.6% of evaluable subjects (10/51),with three individuals presenting loss of heterozygosity across allnon-homozygous HLAs. FIG. 14A-B shows examples of sets of panels thatidentify a comparison of HLA sequences between a normal sample and acorresponding tumor sample of a particular subject. For example, FIG.14A shows a set of panels that identify a comparison of HLA-A sequencesbetween the normal and tumor samples of the subject, and FIG. 14B showsa set of panels that identify a comparison of HLA-C sequences betweenthe normal and tumor samples of the subject.

The panels of FIGS. 14A-B provide NGS sequence-based evidence for HLAloss of heterozygosity in HLA-A and HLA-C of subject 54 of the discoverycohort. HLA-B is not shown. The first row shows the raw read coverage ofboth homologous alleles in the normal sample. The second row shows theraw read coverage of both homologous alleles in the tumor sample. Bothplots have vertical grey lines representing the positions of differencebetween the two alleles. Due to strict mapping parameters requiring allreads to map without mismatch, differences in coverage at the grey linesrepresent true differences in coverage between the alleles. The thirdpanel shows the b-allele frequency from the normal sample (grey) and thetumor sample (black). The b-allele frequency in the tumor sample shouldbe considered in light of the b-allele frequency in the normal samplebecause of primer hybridization differences between the alleles. Thefourth panel shows the ratio in coverage between the tumor and normalsamples for each allele. These values have been normalized by the tumorand normal read depth across the whole exome. The expected value with nocopy number change is one, shown with a dashed grey line. Both the thirdand fourth panel only show data for the mismatch positions between thetwo alleles.

As shown in FIGS. 14A-B, matched normal tissue from the subjectgenerally presents even allele specific coverage across HLA genes A andC. In contrast, tumor tissue from this subject exhibits broad imbalancesin allele specific coverage spanning large portions of each HLA, withlow levels of coverage in HLA-A01:01 and HLA-007:01. B-allele frequency(b-allele frequency) shows absolute difference from the normal.Consistently lower ratio of coverage is observed in the lost alleles(fourth rows in FIGS. 14A-B), which are predicted to present ˜54% ofthis subject's neoantigens, likely reducing capacity for presentation tothe immune system.

VII. Process for Generating a Composite Biomarker Score

FIG. 15 includes a flowchart 1500 illustrating an example of a method ofgenerating a composite biomarker score, according to some embodiments.Operations described in flowchart 1500 may be performed by, for example,a computer system implementing one or more operations for generating acomposite biomarker score based on transcriptomic and genomic metrics.Although flowchart 1500 may describe the operations as a sequentialprocess, in various embodiments, many of the operations may be performedin parallel or concurrently. In addition, the order of the operationsmay be rearranged. An operation may have additional steps not shown inthe figure. Furthermore, embodiments of the method may be implemented byhardware, software, firmware, middleware, microcode, hardwaredescription languages, or any combination thereof. When implemented insoftware, firmware, middleware, or microcode, the program code or codesegments to perform the associated tasks may be stored in acomputer-readable medium such as a storage medium.

At operation 1510, An immunogenomics-analysis system accesses genomicdata and transcriptomic data that were generated by processing abiological sample of a subject. In some instances, the biological sampleincludes one or more cancer cells. The genomic data can identify one ormore DNA sequences in the biological sample, in which whole-exomesequencing can be performed to identify the one or more DNA sequences.The transcriptomic data can identify one or more RNA sequences in thebiological sample, in which transcriptome sequencing can be used toidentify the one or more RNA sequences. Additionally or alternatively,the genomic and the transcriptomic data can be generated from a samplepair that includes the biological sample and a reference biologicalsample of the subject, in which the reference biological sample does notinclude the one or more cancer cells.

At operation 1520, the immunogenomics-analysis system processes thegenomic data to generate a set of genomic metrics. Each of the set ofgenomic metrics can represent one or more characteristics correspondingto a corresponding DNA sequence the one or more DNA sequences. In someinstances, the set of genomic metrics include: (i) a quantitative orcategorical metric that represents one or more characteristics for eachof one or more somatic mutations in the one or more DNA sequences; (ii)a categorical metric that indicates whether a loss of heterozygosity hasoccurred in at least one human leukocyte antigen (HLA) gene of thebiological sample; and (iii) a quantitative or categorical metric thatrepresents a predicted tumor mutational burden. With respect to the HLAloss of heterozygosity, the corresponding categorical metric can begenerated by applying the genomic data to an HLA-deletion-identificationmachine-learning model.

At operation 1530, the immunogenomics-analysis system processes thetranscriptomic data to generate a set of transcriptomic metrics. Each ofthe set of transcriptomic metrics can represent one or morecharacteristics corresponding to a set of peptides that are translatedfrom a corresponding RNA sequence of the one or more RNA sequences. Insome instances, the set of transcriptomic metrics include: (i) aquantitative or categorical metric that represents a predictedneoantigen burden of the biological sample; (ii) a quantitative orcategorical metric that represents one or more characteristics of eachof one or more candidate neoantigens detected from the biologicalsample; (iii) a quantitative or categorical metric that represents oneor more characteristics of each of one or more HLA proteins for which aloss of cell-surface presentation is detected; (iv) a quantitative orcategorical metric that represents one or more characteristicscorresponding to an HLA gene that encodes the one or more HLA proteinsfor which the loss of cell-surface presentation was detected; (v) aquantitative or categorical metric that represents an expression levelof a sequence corresponding to an immune cell; and (vi) a quantitativeor categorical metric that represents an expression level of one or moreT-cell receptors detected from the biological sample. With respect tothe HLA proteins for which a loss of cell-surface presentation isdetected, the corresponding metric can be generated by applying thegenomic and transcriptomic data to a neoantigen-presentation-predictionmachine-learning model.

At operation 1540, the immunogenomics-analysis system generates acomposite biomarker score derived from the set of genomic metrics andthe set of transcriptomic metrics. In some instances, theimmunogenomics-analysis system generates the composite biomarker scoreby: (i) weighting each genomic metric of the set of genomic metrics witha weight value determined based on a corresponding transcriptomic metricof the set of transcriptomic metrics; and (ii) generating the compositebiomarker score using the weighted genomic metrics.

At operation 1550, the immunogenomics-analysis system determines, basedon the composite biomarker score, a predicted level of responsiveness ofthe subject to a particular type of an immunotherapy treatment.

At operation 1560, the immunogenomics-analysis system outputs a resultthat corresponds to the predicted level of responsiveness of thesubject. The result can be report that identifies, based on thepredicted level of responsiveness of the subject to the particulartreatment: (i) a treatment recommendation of the particular treatment;(ii) a recommendation to administer the particular treatment to thehuman subject; and/or (iii) a recommendation to not administer theparticular treatment to the human subject. In some embodiments, therecommended treatment is administered to the human subject. Process 1500terminates thereafter.

VIII. Additional Considerations

While the present subject matter has been described in detail withrespect to specific embodiments thereof, it will be appreciated thatthose skilled in the art, upon attaining an understanding of theforegoing may readily produce alterations to, variations of, andequivalents to such embodiments. Accordingly, it should be understoodthat the present disclosure has been presented for purposes of examplerather than limitation, and does not preclude inclusion of suchmodifications, variations, and/or additions to the present subjectmatter as would be readily apparent to one of ordinary skill in the art.Indeed, the methods and systems described herein may be embodied in avariety of other forms; furthermore, various omissions, substitutionsand changes in the form of the methods and systems described herein maybe made without departing from the spirit of the present disclosure. Theaccompanying claims and their equivalents are intended to cover suchforms or modifications as would fall within the scope and spirit of thepresent disclosure.

Unless specifically stated otherwise, it is appreciated that throughoutthis specification discussions utilizing terms such as “processing,”“computing,” “calculating,” “determining,” and “identifying” or the likerefer to actions or processes of a computing device, such as one or morecomputers or a similar electronic computing device or devices, thatmanipulate or transform data represented as physical electronic ormagnetic quantities within memories, registers, or other informationstorage devices, transmission devices, or display devices of thecomputing platform.

The system or systems discussed herein are not limited to any particularhardware architecture or configuration. A computing device can includeany suitable arrangement of components that provide a result conditionedon one or more inputs. Suitable computing devices include multipurposemicroprocessor-based computing systems accessing stored software thatprograms or configures the computing system from a general purposecomputing apparatus to a specialized computing apparatus implementingone or more embodiments of the present subject matter. Any suitableprogramming, scripting, or other type of language or combinations oflanguages may be used to implement the teachings contained herein insoftware to be used in programming or configuring a computing device.

Embodiments of Some embodiments may be performed in the operation ofsuch computing devices. The order of the blocks presented in theexamples above can be varied—for example, blocks can be re-ordered,combined, and/or broken into sub-blocks. Certain blocks or processes canbe performed in parallel.

Conditional language used herein, such as, among others, “can,” “could,”“might,” “may,” “e.g.,” and the like, unless specifically statedotherwise, or otherwise understood within the context as used, isgenerally intended to convey that certain examples include, while otherexamples do not include, certain features, elements, and/or steps. Thus,such conditional language is not generally intended to imply thatfeatures, elements and/or steps are in any way required for one or moreexamples or that one or more examples necessarily include logic fordeciding, with or without author input or prompting, whether thesefeatures, elements and/or steps are included or are to be performed inany particular example.

The terms “including,” “including,” “having,” and the like aresynonymous and are used inclusively, in an open-ended fashion, and donot exclude additional elements, features, acts, operations, and soforth. Also, the term “or” is used in its inclusive sense (and not inits exclusive sense) so that when used, for example, to connect a listof elements, the term “or” means one, some, or all of the elements inthe list. The use of “adapted to” or “configured to” herein is meant asopen and inclusive language that does not foreclose devices adapted toor configured to perform additional tasks or steps. Additionally, theuse of “based on” is meant to be open and inclusive, in that a process,step, calculation, or other action “based on” one or more recitedconditions or values may, in practice, be based on additional conditionsor values beyond those recited. Similarly, the use of “based at least inpart on” is meant to be open and inclusive, in that a process, step,calculation, or other action “based at least in part on” one or morerecited conditions or values may, in practice, be based on additionalconditions or values beyond those recited. Headings, lists, andnumbering included herein are for ease of explanation only and are notmeant to be limiting.

The various features and processes described above may be usedindependently of one another, or may be combined in various ways. Allpossible combinations and sub-combinations are intended to fall withinthe scope of the present disclosure. In addition, certain method orprocess blocks may be omitted in some implementations. The methods andprocesses described herein are also not limited to any particularsequence, and the blocks or states relating thereto can be performed inother sequences that are appropriate. For example, described blocks orstates may be performed in an order other than that specificallydisclosed, or multiple blocks or states may be combined in a singleblock or state. The example blocks or states may be performed in serial,in parallel, or in some other manner. Blocks or states may be added toor removed from the disclosed examples. Similarly, the example systemsand components described herein may be configured differently thandescribed. For example, elements may be added to, removed from, orrearranged compared to the disclosed examples.

What is claimed is:
 1. A method comprising: accessing genomic data andtranscriptomic data that were generated by processing a biologicalsample of a subject, wherein: the biological sample includes one or morecancer cells; the genomic data identifies one or more DNA sequences inthe biological sample; and the transcriptomic data identifies one ormore RNA sequences in the biological sample; generating, based on thegenomic data, a set of genomic metrics, wherein each of the set ofgenomic metrics represents one or more characteristics corresponding toa corresponding DNA sequence the one or more DNA sequences; generating,based on the transcriptomic data, a set of transcriptomic metrics,wherein each of the set of transcriptomic metrics represents one or morecharacteristics corresponding to a set of peptides that are translatedfrom a corresponding RNA sequence of the one or more RNA sequences;identifying a composite biomarker score derived from the set of genomicmetrics and the set of transcriptomic metrics; determining, based on thecomposite biomarker score, a predicted level of responsiveness of thesubject to a particular type of an immunotherapy treatment; andoutputting a result that corresponds to the predicted level ofresponsiveness of the subject.
 2. The method of claim 1, whereingenerating the set of genomic metrics comprises determining aquantitative or categorical metric that represents one or morecharacteristics for each of one or more somatic mutations in the one ormore DNA sequences.
 3. The method of claim 1, wherein generating the setof genomic metrics comprises determining a categorical metric thatindicates whether a loss of heterozygosity has occurred in at least onehuman leukocyte antigen (HLA) gene of the biological sample.
 4. Themethod of claim 3, wherein determining the metric that indicates whetherthe loss of heterozygosity has occurred comprises applying the genomicdata to an HLA-deletion-identification machine-learning model togenerate an output that corresponds to the metric indicating whetherloss of heterozygosity has occurred.
 5. The method of claim 1, whereingenerating the set of transcriptomic metrics comprises determining aquantitative or categorical metric that represents a predictedneoantigen burden of the biological sample.
 6. The method of claim 1,wherein generating the set of transcriptomic metrics comprisesdetermining, based on the genomic data and the transcriptomic data, aquantitative or categorical metric that represents one or morecharacteristics of each of one or more candidate neoantigens detectedfrom the biological sample.
 7. The method of claim 1, wherein generatingthe set of transcriptomic metrics comprises generating a quantitative orcategorical metric that represents one or more characteristics of eachof one or more HLA proteins for which a loss of cell-surfacepresentation is detected.
 8. The method of claim 7, wherein generatingthe set of transcriptomic metrics comprises generating, based on thetranscriptomic data, a quantitative or categorical metric thatrepresents one or more characteristics corresponding to an HLA gene thatencodes the one or more HLA proteins for which the loss of cell-surfacepresentation was detected.
 9. The method of claim 7, wherein generatingthe set of transcriptomic metrics comprises applying the genomic dataand the transcriptomic data to a neoantigen-presentation-predictionmachine-learning model to generate the quantitative or categoricalmetric that represents the one or more characteristics of each of theone or more HLA proteins.
 10. The method of claim 1, wherein generatingthe set of transcriptomic metrics includes determining a quantitative orcategorical metric that represents an expression level of one or moreT-cell receptors detected from the biological sample.
 11. The method ofclaim 1, wherein the biological sample was collected from a tumor of thesubject, and wherein generating the set of transcriptomic metricsincludes determining a quantitative or categorical metric thatrepresents an expression level of a sequence corresponding to an immunecell.
 12. The method of claim 1, wherein accessing the genomic data andtranscriptomic data comprises using whole-exome sequencing to identifythe one or more DNA sequences.
 13. The method of claim 1, whereinaccessing the genomic data and transcriptomic data comprises usingtranscriptome sequencing to identify the one or more RNA sequences. 14.The method of claim 1, wherein accessing the genomic data andtranscriptomic data comprises generating the genomic and thetranscriptomic data from the biological sample and a referencebiological sample of the subject, wherein the reference biologicalsample does not include the one or more cancer cells.
 15. The method ofclaim 1, wherein generating the composite biomarker score includes:weighting each genomic metric of the set of genomic metrics with aweight value determined based on a corresponding transcriptomic metricof the set of transcriptomic metrics; and generating the compositebiomarker score using the weighted genomic metrics.
 16. A systemcomprising: one or more data processors; and a non-transitory computerreadable storage medium containing instructions which, when executed onthe one or more data processors, cause the one or more data processorsto perform one or more operations comprising: accessing genomic data andtranscriptomic data that were generated by processing a biologicalsample of a subject, wherein: the biological sample includes one or morecancer cells; the genomic data identifies one or more DNA sequences inthe biological sample; and the transcriptomic data identifies one ormore RNA sequences in the biological sample; generating, based on thegenomic data, a set of genomic metrics, wherein each of the set ofgenomic metrics represents one or more characteristics corresponding toa corresponding DNA sequence the one or more DNA sequences; generating,based on the transcriptomic data, a set of transcriptomic metrics,wherein each of the set of transcriptomic metrics represents one or morecharacteristics corresponding to a set of peptides that are translatedfrom a corresponding RNA sequence of the one or more RNA sequences;identifying a composite biomarker score derived from the set of genomicmetrics and the set of transcriptomic metrics; determining, based on thecomposite biomarker score, a predicted level of responsiveness of thesubject to a particular type of an immunotherapy treatment; andoutputting a result that corresponds to the predicted level ofresponsiveness of the subject.
 17. A computer-program product tangiblyembodied in a non-transitory machine-readable storage medium, includinginstructions configured to cause one or more data processors to performone or more operations comprising: accessing genomic data andtranscriptomic data that were generated by processing a biologicalsample of a subject, wherein: the biological sample includes one or morecancer cells; the genomic data identifies one or more DNA sequences inthe biological sample; and the transcriptomic data identifies one ormore RNA sequences in the biological sample; generating, based on thegenomic data, a set of genomic metrics, wherein each of the set ofgenomic metrics represents one or more characteristics corresponding toa corresponding DNA sequence the one or more DNA sequences; generating,based on the transcriptomic data, a set of transcriptomic metrics,wherein each of the set of transcriptomic metrics represents one or morecharacteristics corresponding to a set of peptides that are translatedfrom a corresponding RNA sequence of the one or more RNA sequences;identifying a composite biomarker score derived from the set of genomicmetrics and the set of transcriptomic metrics; determining, based on thecomposite biomarker score, a predicted level of responsiveness of thesubject to a particular type of an immunotherapy treatment; andoutputting a result that corresponds to the predicted level ofresponsiveness of the subject.
 18. The computer-program product of claim17, wherein generating the set of transcriptomic metrics comprisesdetermining a quantitative or categorical metric that represents apredicted neoantigen burden of the biological sample.
 19. Thecomputer-program product of claim 17, wherein generating the set oftranscriptomic metrics comprises determining, based on the genomic dataand the transcriptomic data, a quantitative or categorical metric thatrepresents one or more characteristics of each of one or more candidateneoantigens detected from the biological sample.
 20. Thecomputer-program product of claim 17, wherein generating the set oftranscriptomic metrics comprises generating a quantitative orcategorical metric that represents one or more characteristics of eachof one or more HLA proteins for which a loss of cell-surfacepresentation is detected.